join 类型语法支持
from语句允许JOIN表达式和表名列表,将joined_table从table_ref中分离出来,It may seem silly to separate joined_table from table_ref, but there is method in SQL’s madness: if you don’t do it this way you get reduce-reduce conflicts, because it’s not clear to the parser generator whether to expect alias_clause after ‘)’ or not.
/* table_ref is where an alias clause can be attached. */
table_ref: | joined_table
{ $$ = (Node *) $1; }
| '(' joined_table ')' alias_clause
{ $2->alias = $4; $$ = (Node *) $2; }
joined_table:
'(' joined_table ')' { $$ = $2; }
| table_ref CROSS JOIN table_ref
{ /* CROSS JOIN is same as unqualified inner join */
JoinExpr *n = makeNode(JoinExpr); n->jointype = JOIN_INNER; n->isNatural = FALSE;
n->larg = $1; n->rarg = $4; n->usingClause = NIL; n->quals = NULL; $$ = n; }
| table_ref join_type JOIN table_ref join_qual
{ JoinExpr *n = makeNode(JoinExpr); n->jointype = $2; n->isNatural = FALSE;
n->larg = $1; n->rarg = $4;
if ($5 != NULL && IsA($5, List)) n->usingClause = (List *) $5; /* USING clause */
else n->quals = $5; /* ON clause */
$$ = n; }
| table_ref JOIN table_ref join_qual
{ /* letting join_type reduce to empty doesn't work */
JoinExpr *n = makeNode(JoinExpr); n->jointype = JOIN_INNER; n->isNatural = FALSE;
n->larg = $1; n->rarg = $3;
if ($4 != NULL && IsA($4, List)) n->usingClause = (List *) $4; /* USING clause */
else n->quals = $4; /* ON clause */
$$ = n; }
| table_ref NATURAL join_type JOIN table_ref
{ JoinExpr *n = makeNode(JoinExpr); n->jointype = $3; n->isNatural = TRUE;
n->larg = $1; n->rarg = $5; n->usingClause = NIL; /* figure out which columns later... */ n->quals = NULL; /* fill later */
$$ = n; }
| table_ref NATURAL JOIN table_ref
{ /* letting join_type reduce to empty doesn't work */
JoinExpr *n = makeNode(JoinExpr); n->jointype = JOIN_INNER; n->isNatural = TRUE;
n->larg = $1; n->rarg = $4; n->usingClause = NIL; /* figure out which columns later... */ n->quals = NULL; /* fill later */
$$ = n; }
For the same reason we must treat ‘JOIN’ and ‘join_type JOIN’ separately, rather than allowing join_type to expand to empty; if we try it, the parser generator can’t figure out when to reduce an empty join_type right after table_ref. 以上语法支持join类型如下所示。join_type支持FULL OUTER_P/FULL、LEFT OUTER_P/LEFT、RIGHT OUTER_P/RIGHT、INNER_P语法。
parser JOIN类型 | JOIN类型 | natural | usingClause | quals |
---|---|---|---|---|
CROSS JOIN(CROSS JOIN is same as unqualifies inner join) | JOIN_INNER | FALSE | NIL | NULL |
join_type | join_type | FALSE | join_qual USING clause | join_qual ON clause |
JOIN | JOIN_INNER | FALSE | join_qual USING clause | join_qual ON clause |
NATURAL join_type JOIN | join_type | TRUE | NIL | NULL |
NATURAL JOIN | JOIN_INNER | TRUE | NIL | NULL |
join_type: FULL join_outer { $$ = JOIN_FULL; }
| LEFT join_outer { $$ = JOIN_LEFT; }
| RIGHT join_outer { $$ = JOIN_RIGHT; }
| INNER_P { $$ = JOIN_INNER; }
/* OUTER is just noise... */
join_outer: OUTER_P { $$ = NULL; }
| /*EMPTY*/ { $$ = NULL; }
首先准备了两个表 (Student 和 Course),其中 Student 表中的 C_S_Id 字段为外键列,关联的是 Course 表的 C_Id 主键列。
JOIN_INNER
–PG语法CROSS JOIN(CROSS JOIN is same as unqualifies inner join),PG内部类型JOIN_INNER-- 交叉连接(cross join):交叉连接将会返回被连接的两个表的笛卡尔积,返回结果的行数等于两个表行数的乘积。不加条件返回两个表行数的乘积:select * from Student s cross join Course c
–PG语法INNER_P,PG内部类型JOIN_INNER–内连接(inner join):满足on条件表达式,内连接是取满足条件表达式的两个表的交集(即两个表都有的数据)。select * from Student s inner join Course c on s.C_S_Id=c.C_Id
–PG语法JOIN,PG内部类型JOIN_INNER,USING clause | ON clause–同上
–PG语法NATURAL INNER_P JOIN,PG内部类型JOIN_INNER–说真的,这种连接查询没有存在的价值,既然是SQL2标准中定义的,就给出个例子看看吧。自然连接无需指定连接列,SQL会检查两个表中是否相同名称的列,且假设他们在连接条件中使用,并且在连接条件中仅包含一个连接列。不允许使用ON语句,不允许指定显示列,显示列只能用*表示。对于每种连接类型(除了交叉连接外),均可指定NATURAL。下面给出几个例子。
SELECT * FROM ORDERS O NATURAL INNER JOIN CUSTOMERS C;
SELECT * FROM ORDERS O NATURAL LEFT OUTER JOIN CUSTOMERS C;
SELECT * FROM ORDERS O NATURAL RIGHT OUTER JOIN CUSTOMERS C;
SELECT * FROM ORDERS O NATURAL FULL OUTER JOIN CUSTOMERS C;
–PG语法NATURAL JOIN,PG内部类型JOIN_INNER–同上
JOIN_FULL
–PG语法FULL OUTER_P/FULL,PG内部类型JOIN_FULL,USING clause | ON clause–全外连接(full join / full outer join):满足on条件表达式,返回两个表符合条件的所有行,a表没有匹配的则a表的列返回null,b表没有匹配的则b表的列返回null,即返回的是左连接和右连接的并集。select * from Student s full join Course c on s.C_S_Id=c.C_Id
–PG语法NATURAL FULL OUTER_P/FULL JOIN,PG内部类型JOIN_FULL–
JOIN_LEFT
–PG语法LEFT OUTER_P/LEFT,PG内部类型JOIN_LEFT,USING clause | ON clause–左外连接(left join / left outer join): 满足on条件表达式,左外连接是以左表为准,返回左表所有的数据,与右表匹配的则有值,没有匹配的则以空(null)取代。select * from Student s left join Course c on s.C_S_Id=c.C_Id
–PG语法NATURAL LEFT OUTER_P/LEFT JOIN,PG内部类型JOIN_LEFT–
JOIN_RIGHT
–PG语法RIGHT OUTER_P/RIGHT,PG内部类型JOIN_RIGHT,USING clause | ON clause–右外连接(right join / right outer join):满足on条件表达式,右外连接是以右表为准,返回右表所有的数据,与左表匹配的则有值,没有匹配的则以空(null)取代。select * from Student s right join Course c on s.C_S_Id=c.C_Id
–PG语法NATURAL RIGHT OUTER_P/RIGHT JOIN,PG内部类型JOIN_RIGHT–
/* JoinType - enums for types of relation joins
* JoinType determines the exact semantics of joining two relations using
* a matching qualification. For example, it tells what to do with a tuple
* that has no match in the other relation.
* This is needed in both parsenodes.h and plannodes.h, so put it here... */
typedef enum JoinType{
/* The canonical kinds of joins according to the SQL JOIN syntax. Only these codes can appear in parser output (e.g., JoinExpr nodes). */
JOIN_INNER, /* matching tuple pairs only */
JOIN_LEFT, /* pairs + unmatched LHS tuples */
JOIN_FULL, /* pairs + unmatched LHS + unmatched RHS */
JOIN_RIGHT, /* pairs + unmatched RHS tuples */
} JoinType;
join类型转换支持
/* JoinType - enums for types of relation joins
* JoinType determines the exact semantics of joining two relations using
* a matching qualification. For example, it tells what to do with a tuple
* that has no match in the other relation.
* This is needed in both parsenodes.h and plannodes.h, so put it here... */
typedef enum JoinType{
/* Semijoins and anti-semijoins (as defined in relational theory) do not
* appear in the SQL JOIN syntax, but there are standard idioms for
* representing them (e.g., using EXISTS). The planner recognizes these
* cases and converts them to joins. So the planner and executor must
* support these codes. NOTE: in JOIN_SEMI output, it is unspecified
* which matching RHS row is joined to. In JOIN_ANTI output, the row is
* guaranteed to be null-extended. */
JOIN_SEMI, /* 1 copy of each LHS row that has match(es) */
JOIN_ANTI, /* 1 copy of each LHS row that has no match */
JOIN_LASJ_NOTIN, /* Left Anti Semi Join with Not-In semantics: If any NULL values are produced by inner side, return no join results. Otherwise, same as LASJ */
} JoinType;
半连接 SEMI JOIN 是指在两表关联时,当第二个表中存在一个或多个匹配记录时,返回第一个表的记录。与普通JOIN不同,SEMI JOIN中第一个表里的记录最多只返回一次。SEMI JOIN 通常无法直接用SQL语句来表示,而是由 IN 或 EXISTS 子查询转换得到。SQL举例:
SELECT * FROM employees WHERE dept_name IN ( SELECT dept_name FROM departments )
SELECT * FROM employees WHERE EXISTS ( SELECT * FROM departments WHERE employees.dept_name = departments.dept_name )
反连接 ANTI JOIN 与半连接 SEMI JOIN 相反,是指在两表关联时,当第二个表中不存在匹配记录时,返回第一个表的记录。ANTI JOIN 通常无法直接用SQL语句来表示,而是由 NOT IN 或 NOT EXISTS 子查询转换得到。SQL举例:
SELECT * FROM employees WHERE dept_name NOT IN ( SELECT dept_name FROM departments )
SELECT * FROM employees WHERE NOT EXISTS ( SELECT * FROM departments WHERE employees.dept_name = departments.dept_name )
从上述注释中可以看出,这些JOIN类型是再上拉子连接时转换成JOIN的,其调用栈如下所示:
pull_up_sublinks --> pull_up_sublinks_jointree_recurse --> pull_up_sublinks_qual_recurse --> convert_ANY_sublink_to_join --> result->jointype = JOIN_SEMI
pull_up_sublinks --> pull_up_sublinks_jointree_recurse --> pull_up_sublinks_qual_recurse --> convert_EXISTS_sublink_to_join --> result->jointype = under_not ? JOIN_ANTI : JOIN_SEMI;
pull_up_sublinks --> pull_up_sublinks_jointree_recurse --> pull_up_sublinks_qual_recurse --> convert_IN_to_antijoin --> JoinExpr *join_expr = make_join_expr(NULL, subq_indx, JOIN_LASJ_NOTIN)
subquery_planner --> [if we have any outer joins, try to reduce them to plain inner joins] reduce_outer_joins --> reduce_outer_joins_pass2 --> jointype = JOIN_ANTI see if we can reduce JOIN_LEFT to JOIN_ANTI
优化器内部使用join类型
/* JoinType - enums for types of relation joins
* JoinType determines the exact semantics of joining two relations using
* a matching qualification. For example, it tells what to do with a tuple
* that has no match in the other relation.
* This is needed in both parsenodes.h and plannodes.h, so put it here... */
typedef enum JoinType{
/* These codes are used internally in the planner, but are not supported
* by the executor (nor, indeed, by most of the planner). */
JOIN_UNIQUE_OUTER, /* LHS path must be made unique */
JOIN_UNIQUE_INNER, /* RHS path must be made unique */
/* GPDB: Like JOIN_UNIQUE_OUTER/INNER, these codes are used internally
* in the planner, but are not supported by the executor or by most of the
* planner. A JOIN_DEDUP_SEMI join indicates a semi-join, but to be
* implemented by performing a normal inner join, and eliminating the
* duplicates with a UniquePath above the join. That can be useful in
* an MPP environment, if performing the join as an inner join avoids
* moving the larger of the two relations. */
JOIN_DEDUP_SEMI, /* inner join, LHS path must be made unique afterwards */
JOIN_DEDUP_SEMI_REVERSE /* inner join, RHS path must be made unique afterwards */
} JoinType;
如上这些JOIN类型用于确定使用匹配限定连接两个关系的确切语义。例如,它告诉如何处理在另一个关系中没有匹配项的元组。调用堆栈如下所示:make_join_rel Find or create a join RelOptInfo that represents the join ofthe two given rels, and add to it path information for paths created with the two rels as outer and inner rel. (The join rel may already contain paths generated from other pairs of rels that add up to the same set of base rels.)
- Construct Relids set that identifies the joinrel.
Relids joinrelids = bms_union(rel1->relids, rel2->relids);
- Check validity and determine join type.
join_is_legal(root, rel1, rel2, joinrelids, &sjinfo, &reversed)
- Find or build the join RelOptInfo, and compute the restrictlist that goes with this particular joining.
RelOptInfo *joinrel = build_join_rel(root, joinrelids, rel1, rel2, sjinfo, &restrictlist);
- 针对sjinfo->jointype为JOIN_INNER的处理:
add_paths_to_joinrel(root, joinrel, rel1, rel2, JOIN_INNER, sjinfo, restrictlist); add_paths_to_joinrel(root, joinrel, rel2, rel1, JOIN_INNER, sjinfo, restrictlist);
- 针对sjinfo->jointype为JOIN_LEFT的处理:
add_paths_to_joinrel(root, joinrel, rel1, rel2, JOIN_LEFT, sjinfo, restrictlist); add_paths_to_joinrel(root, joinrel, rel2, rel1, JOIN_RIGHT, sjinfo, restrictlist);
- 针对sjinfo->jointype为JOIN_FULL的处理:
add_paths_to_joinrel(root, joinrel, rel1, rel2, JOIN_FULL, sjinfo, restrictlist); add_paths_to_joinrel(root, joinrel, rel2, rel1, JOIN_FULL, sjinfo, restrictlist);
- 针对sjinfo->jointype为JOIN_SEMI的处理:We might have a normal semijoin, or a case where we don’t have enough rels to do the semijoin but can unique-ify the RHS and then do an innerjoin (see comments in join_is_legal). In the latter case we can’t apply JOIN_SEMI joining该情况下
add_paths_to_joinrel(root, joinrel, rel1, rel2, JOIN_SEMI, sjinfo, restrictlist); add_paths_to_joinrel(root, joinrel, rel1, rel2, JOIN_DEDUP_SEMI, sjinfo, restrictlist); add_paths_to_joinrel(root, joinrel, rel2, rel1, JOIN_DEDUP_SEMI_REVERSE, sjinfo, restrictlist);
;If we know how to unique-ify the RHS and one input rel is exactly the RHS (not a superset) we can consider unique-ifying it and then doing a regular join.create_unique_path(root, rel2, rel2->cheapest_total_path, sjinfo); add_paths_to_joinrel(root, joinrel, rel1, rel2, JOIN_UNIQUE_INNER, sjinfo, restrictlist); add_paths_to_joinrel(root, joinrel, rel2, rel1, JOIN_UNIQUE_OUTER, sjinfo, restrictlist);
- 针对sjinfo->jointype为JOIN_ANTI或JOIN_LASJ_NOTIN的处理:
add_paths_to_joinrel(root, joinrel, rel1, rel2, sjinfo->jointype, sjinfo, restrictlist)
参考资料:
https://www.w3resource.com/slides/sql-joins-slide-presentation.php
https://developer.aliyun.com/article/501423
https://zhuanlan.zhihu.com/p/471575162
https://zhuanlan.zhihu.com/p/627685950