子查询是查询语句中经常出现的一种类型,是比较耗时的操作。优化子查询对查询效率的提升有直接的影响。从子查询出现在SQL语句的位置看,它可以出现在目标列、FROM子句、WHERE子句、JOIN/ON子句、GROUPBY子句、HAVING子句、ORDERBY子句等位置。子查询出现在不同位置对优化的影响如下:
- 目标列位置:子查询如果位于目标列,则只能是标量子查询,否则数据库可能返回类型“错误:子查询必须只能返回一个字段”的提示
- FROM子句位置:相关子查询【子查询的执行依赖于外层父查询的一些属性值。子查询因依赖于父查询的参数,当父查询的参数改变时,子查询需要根据新参数值重新执行】出现在FROM子句中,数据库可能返回类似“在FROM子句中的子查询无法参考相同查询级别中的关系”的提示,所以相关子查询不同出现在FROM子句中;非相关子查询【子查询的执行不依赖于外层父查询的任何属性值,这样的子查询具有独立性,可独立求解,形成一个子查询计划先于外层的查询求解】出现在FROM子句中,可上拉子查询到父层,在多表连接时统一考虑连接代价后择优。
- WHERE子句位置:出现在WHERE子句中的子查询是一个条件表达式的一部分,而表达式可以分解为操作符和操作数;根据参与运算的数据类型的不同,操作符也不尽相同,这对子查询均有一定要求(如INT型的等值操作,要求子查询必须是标量子查询)。另外,子查询出现在WHERE子句中的格式也有用谓词指定的一些操作,如IN、BETWEEN、EXISTS等。
- JOIN/ON子句位置:join/on子句可以拆分为两部分,一是JOIN块类似于FROM子句,二是ON子句类似于WHERE子句,这两部分都可以出现子查询。子查询的处理方式同FROM子句和WHERE子句。
- GROUPBY子句位置:目标列必须和GROUPBY关联。可将子查询写在GROUPBY位置处,但子查询用在GROUPBY处没有实用意义。
- ORDERBY子句位置:可将子查询写在ORDERBY位置处。但ORDERBY操作是作用在整条SQL语句上的,子查询用在ORDERBY处没有实用意义。
PostgreSQL数据库将上述子查询概念进行了分类:子查询通常以范围表的方式存在(select * from student, (select * from score) as sc
);子链接以表达式的方式存在(SELECT (select avg(degree) from score), sname FROM STUDENT
)【在实际应用中,可以通过子句所在位置来区分子链接和子查询,出现在FROM关键字后的子句是子查询;出现在WHERE/ON等约束条件或投影中的子句是子链接】。子链接是出现在表达式中的子查询(A SubLink represents a subselect appearing in an expression, and in some cases also the combining operator(s) just above it.)
子链接使用的特定谓词表达式如下所示:
类型 | 表达式 | 解释 | 解析 |
---|---|---|---|
EXISTS_SUBLINK | EXISTS(SELECT …) | EXISTS谓词 | EXISTS select_with_parens |
ALL_SUBLINK | (lefthand) op ALL (SELECT …) | ALL谓词 | a_expr subquery_Op sub_type select_with_parens或a_expr subquery_Op sub_type ‘(’ a_expr ‘)’ |
ANY_SUBLINK | (lefthand) op ANY (SELECT …) | ANY/IN/SOME谓词 | a_expr IN_P in_expr或a_expr NOT_LA IN_P in_expr或a_expr subquery_Op sub_type ‘(’ a_expr ')'或a_expr subquery_Op sub_type select_with_parens或a_expr subquery_Op sub_type ‘(’ a_expr ‘)’ |
ROWCOMPARE_SUBLINK | (lefthand) op (SELECT …) | IsA(lexpr, RowExpr) IsA(rexpr, SubLink) ((SubLink *) rexpr)->subLinkType == EXPR_SUBLINK 转为ROWCOMPARE_SUBLINK | |
EXPR_SUBLINK | (SELECT with single targetlist item …) | select_with_parens或select_with_parens indirection或创建plan时生成create_plan_recurse–>create_minmaxagg_plan–>SS_make_initplan_from_plan | |
MULTIEXPR_SUBLINK | (SELECT with multiple targetlist items …) | transformExprRecurse --> transformMultiAssignRef ((SubLink *) maref->source)->subLinkType == EXPR_SUBLINK转为MULTIEXPR_SUBLINK | |
ARRAY_SUBLINK | ARRAY(SELECT with single targetlist item …) | ARRAY select_with_parens | |
CTE_SUBLINK | WITH query (never actually part of an expression) | SS_process_ctes --> CTE_SUBLINK |
对于ALL、ANY和ROWCOMPARE,左手边是一个表达式列表,其长度与子选择的targetlist的长度相同。ROWCOMPARE将始终拥有一个包含多个条目的列表;如果子选择只有一个目标,那么解析器将创建一个EXPR_SUBLINK(子选择之上的任何运算符都将单独表示)。ROWCOMPARE、EXPR和MULTIEXPR要求subselect最多传递一行(如果不返回行,则结果为NULL)。ALL、ANY和ROWCOMPARE需要组合运算符来传递布尔结果。ALL和ANY分别使用and和OR语义组合每行结果。For ALL, ANY, and ROWCOMPARE, the lefthand is a list of expressions of the same length as the subselect’s targetlist. ROWCOMPARE will always have a list with more than one entry; if the subselect has just one target then the parser will create an EXPR_SUBLINK instead (and any operator above the subselect will be represented separately). ROWCOMPARE, EXPR, and MULTIEXPR require the subselect to deliver at most one row (if it returns no rows, the result is NULL). ALL, ANY, and ROWCOMPARE require the combining operators to deliver boolean results. ALL and ANY combine the per-row results using AND and OR semantics respectively. ARRAY只需要一个目标列,并使用子选择产生的任意行数创建目标列类型的数组。ARRAY requires just one target column, and creates an array of the target column’s type using any number of rows resulting from the subselect.
SubLink被归类为Expr节点,但它实际上不是可执行的;在计划期间,它必须在表达式树中被“子计划”节点替换。SubLink is classed as an Expr node, but it is not actually executable; it must be replaced in the expression tree by a SubPlan node during planning.
注意:在gram.y的原始输出中,testexpr只包含左手表达式的原始形式(如果有的话),而operName是组合运算符的字符串名称。此外,subselect是一个原始的解析树。在解析分析过程中,解析器将testexpr转换为一个完整的布尔表达式,该表达式将左侧值与表示子选择的输出列的PARAM_SUBLINK节点进行比较。子选择转换为查询。这是在保存的规则和重写器中看到的表示形式。NOTE: in the raw output of gram.y, testexpr contains just the raw form of the lefthand expression (if any), and operName is the String name of the combining operator. Also, subselect is a raw parsetree. During parse analysis, the parser transforms testexpr into a complete boolean expression that compares the lefthand value(s) to PARAM_SUBLINK nodes representing the output columns of the subselect. And subselect is transformed to a Query. This is the representation seen in saved rules and in the rewriter. 在EXISTS、EXPR、MULTIEXPR和ARRAY子链接中,testexpr和operName未使用并且始终为null。In EXISTS, EXPR, MULTIEXPR, and ARRAY SubLinks, testexpr and operName are unused and are always null.
subLinkId当前仅用于MULTIEXPR子链接,在其他子链接中为零。这个数字标识UPDATE语句的SET列表中不同的多个赋值子查询。它仅在特定的目标列表中是唯一的。MULTIEXPR的输出列由列表中其他位置出现的PARAM_MULTEXPR参数引用。subLinkId is currently used only for MULTIEXPR SubLinks, and is zero in other SubLinks. This number identifies different multiple-assignment subqueries within an UPDATE statement’s SET list. It is unique only within a particular targetlist. The output column(s) of the MULTIEXPR are referenced by PARAM_MULTIEXPR Params appearing elsewhere in the tlist.
CTE_SUBLINK的情况从未出现在实际的SUBLINK节点中,但它用于为WITH子查询生成的子计划中。The CTE_SUBLINK case never occurs in actual SubLink nodes, but it is used in SubPlans generated for WITH subqueries.
PostgreSQL主要对ANY_SUBLINK和EXISTS_SUBLINK两种类型的子链接尝试提升。
谓词 | 形式 | 描述 |
---|---|---|
[NOT] IN | LH [NOT] IN EXPR | 如果提升,则变为[反]半连接([Anti-] Semi Join) |
ANY/SOME | LH OP ANY EXPR | 如果提升,则变为半连接,即Semi Join |
[NOT] EXISTS | [NOT] EXISTS EXPR | 如果提升,则变为[反]半连接([Anti-] Semi Join) |
pull_up_sublinks
本文主要介绍函数pull_up_sublinks尝试将ANY和EXISTS子链接上拉为半连接或反半连接(Attempt to pull up ANY and EXISTS SubLinks to be treated as semijoins or anti-semijoins.)。子句“foo-op ANY(sub-SELECT)”可以通过向上拉子SELECT成为范围表条目rangetable entry并将隐含的比较视为半联接的quals来处理。然而,这种优化仅适用于WHERE或JOIN/ON子句的顶级,因为我们无法区分ANY在涉及NULL输入的情况下应该返回FALSE还是NULL。此外,在外部联接的ON子句中,只有当子链接退化时(即,仅引用联接的可为null的一侧),我们才能这样做。在这种情况下,将半联接向下推到联接的可为null的一侧是合法的。如果子链接引用了任何不可为null的边变量,那么它将不得不作为外部联接的一部分进行计算,这使得事情变得太复杂了。A clause “foo op ANY (sub-SELECT)” can be processed by pulling the sub-SELECT up to become a rangetable entry and treating the implied comparisons as quals of a semijoin. However, this optimization only works at the top level of WHERE or a JOIN/ON clause, because we cannot distinguish whether the ANY ought to return FALSE or NULL in cases involving NULL inputs. Also, in an outer join’s ON clause we can only do this if the sublink is degenerate (ie, references only the nullable side of the join). In that case it is legal to push the semijoin down into the nullable side of the join. If the sublink references any nonnullable-side variables then it would have to be evaluated as part of the outer join, which makes things way too complicated.在类似的条件下,EXISTS和NOT EXISTS子句可以通过拉起子SELECT并创建半联接或反半联接来处理。Under similar conditions, EXISTS and NOT EXISTS clauses can be handled by pulling up the sub-SELECT and creating a semijoin or anti-semijoin. 这个例程搜索这样的子句,并执行必要的解析树转换(如果有的话)。This routine searches for such clauses and does the necessary parsetree transformations if any are found. 这个例程必须在preprocess_expression之前运行,所以quals子句还没有简化为隐式AND格式,也不能保证是AND/OR平面的。这意味着我们需要递归地搜索显式AND子句。我们一碰到非AND项目就停止。This routine has to run before preprocess_expression(), so the quals clauses are not yet reduced to implicit-AND format, and are not guaranteed to be AND/OR-flat either. That means we need to recursively search through explicit AND clauses. We stop as soon as we hit a non-AND item.
void pull_up_sublinks(PlannerInfo *root) {
Relids relids;
/* Begin recursion through the jointree */
Node *jtnode = pull_up_sublinks_jointree_recurse(root,
(Node *) root->parse->jointree, /* 查询语句的FROM和WHREE子句对应部分 */
&relids);
/* root->parse->jointree must always be a FromExpr, so insert a dummy one if we got a bare RangeTblRef or JoinExpr out of the recursion. */
if (IsA(jtnode, FromExpr)) root->parse->jointree = (FromExpr *) jtnode;
else root->parse->jointree = makeFromExpr(list_make1(jtnode), NULL);
}
pull_up_sublinks_jointree_recurse
static Node *pull_up_sublinks_jointree_recurse(PlannerInfo *root, Node *jtnode, Relids *relids)
函数用于递归上拉各种类型子句中存在的子链接(IN、[NOT] EXISTS类型)。对于子句中的FromExpr、JoinExpr,一是递归调用本身函数自身处理其中可能存在的子链接;二是调用pull_up_sublinks_qual_recurse函数处理其中的quals限制条件。
如果jtnode是RangeTblRef - reference to an entry in the query’s rangetable,范围表(RangeTblEntry)表示的是查询对象,或是一个普通的关系或是一个FROM子句中出现的子查询(a sub-select in FROM),或是连接子句的连接结果(result of a JOIN clause)。
如果子链接是范围表,直接合并到表示关系的relids中去。
else if (IsA(jtnode, RangeTblRef)){
int varno = ((RangeTblRef *) jtnode)->rtindex;
*relids = bms_make_singleton(varno); /* jtnode is returned unmodified */
}
处理FromExpr,首先递归处理每一个FROM中的对象,上拉其中的子链接,并为其创建FromExpr封装。最后递归上拉子链接中的条件。
/* First, recurse to process children and collect their relids */
foreach(l, (FromExpr *) jtnode->fromlist){
Relids childrelids;
Node *newchild = pull_up_sublinks_jointree_recurse(root, lfirst(l), &childrelids);
List *newfromlist = lappend(newfromlist, newchild);
frelids = bms_join(frelids, childrelids);
}
/* Build the replacement FromExpr; no quals yet */
FromExpr *newf = makeFromExpr(newfromlist, NULL);
/* Set up a link representing the rebuilt jointree */
Node *jtlink = (Node *) newf;
/* Now process qual --- all children are available for use */
newf->quals = pull_up_sublinks_qual_recurse(root, f->quals, &jtlink, frelids, NULL, NULL);
处理JoinExpr,递归处理连接对象中的左子树和右子树,上拉它们中的子连接;递归上拉子链接中的条件:主要体现在join类型,如果是inner join,则使用bms_union(leftrelids, rightrelids)
参数,如果是left join则仅使用rightrelids
,如果是right jon,则leftrelids。
JoinExpr *j = (JoinExpr *) palloc(sizeof(JoinExpr)); memcpy(j, jtnode, sizeof(JoinExpr)); Node *jtlink = (Node *) j; /* Make a modifiable copy of join node, but don't bother copying its subnodes (yet). */
Relids leftrelids; Relids rightrelids;
/* Recurse to process children and collect their relids */
j->larg = pull_up_sublinks_jointree_recurse(root, j->larg, &leftrelids);
j->rarg = pull_up_sublinks_jointree_recurse(root, j->rarg, &rightrelids);
/* Now process qual, showing appropriate child relids as available, and attach any pulled-up jointree items at the right place. In the inner-join case we put new JoinExprs above the existing one (much as for a FromExpr-style join). In outer-join cases the new JoinExprs must go into the nullable side of the outer join. The point of the available_rels machinations is to ensure that we only pull up quals for which that's okay. We don't expect to see any pre-existing JOIN_SEMI or JOIN_ANTI nodes here. */
switch (j->jointype) {
case JOIN_INNER: j->quals = pull_up_sublinks_qual_recurse(root, j->quals, &jtlink, bms_union(leftrelids, rightrelids), NULL, NULL); break;
case JOIN_LEFT: j->quals = pull_up_sublinks_qual_recurse(root, j->quals, &j->rarg, rightrelids, NULL, NULL); break;
case JOIN_FULL: /* can't do anything with full-join quals */ break;
case JOIN_RIGHT:
j->quals = pull_up_sublinks_qual_recurse(root, j->quals, &j->larg, leftrelids, NULL, NULL); break;
default: elog(ERROR, "unrecognized join type: %d", (int) j->jointype);
break;
}
/* Although we could include the pulled-up subqueries in the returned relids, there's no need since upper quals couldn't refer to their outputs anyway. But we *do* need to include the join's own rtindex because we haven't yet collapsed join alias variables, so upper levels would mistakenly think they couldn't use references to this join. */
*relids = bms_join(leftrelids, rightrelids);
if (j->rtindex) *relids = bms_add_member(*relids, j->rtindex);
jtnode = jtlink;