SQL 语句执行计划中的连接方式
join操作
join操作基本分为3大类:外连接(细分为:左连接(Left outer join/ left join)、右连接(right outer join/ right join)、全连接(full outer join/ full join))、自然连接(Natural join)、内连接(Inner Join/join)。
测试环境:
CREATE TABLE t01(id1 int,id2 int,id3 int);
CREATE TABLE t02(id1 int,id4 int,id5 int);
INSERT INTO t01 VALUES(1,1,1);
INSERT INTO t01 VALUES(2,2,2);
INSERT INTO t01 VALUES(3,3,3);
INSERT INTO t02 VALUES(1,4,4);
INSERT INTO t02 VALUES(2,2,2);
INSERT INTO t02 VALUES(4,6,6);
commit;
SELECT * FROM t01;
SELECT * FROM t02;
ITMS5_1@hfzcdb> SELECT * FROM t01;
ID1 ID2 ID3
---------- ---------- ----------
1 1 1
2 2 2
3 3 3
ITMS5_1@hfzcdb> SELECT * FROM t02;
ID1 ID4 ID5
---------- ---------- ----------
1 4 4
2 2 2
4 6 6
内连接:
ITMS5_1@hfzcdb> SELECT * from t01 INNER JOIN t02 ON t01.id1=t02.id1;
ID1 ID2 ID3 ID1 ID4 ID5
---------- ---------- ---------- ---------- ---------- ----------
1 1 1 1 4 4
2 2 2 2 2 2
#或者:
ITMS5_1@hfzcdb> SELECT * from t01 JOIN t02 ON t01.id1=t02.id1;
ID1 ID2 ID3 ID1 ID4 ID5
---------- ---------- ---------- ---------- ---------- ----------
1 1 1 1 4 4
2 2 2 2 2 2
自然连接
自然连接是在两张表中寻找那些数据类型和列名都相同的字段,然后自动地将他们连接起来,并返回所有符合条件按的结果。
我们也可以将自然连接理解为内连接的一种。
ITMS5_1@hfzcdb> SELECT * from t01 Natural JOIN t02;
ID1 ID2 ID3 ID4 ID5
---------- ---------- ---------- ---------- ----------
1 1 1 4 4
2 2 2 2 2
左外连接:
ITMS5_1@hfzcdb> SELECT * FROM t01 a LEFT OUTER JOIN t02 b ON a.id1=b.id1;
ID1 ID2 ID3 ID1 ID4 ID5
---------- ---------- ---------- ---------- ---------- ----------
1 1 1 1 4 4
2 2 2 2 2 2
3 3 3
右外连接
ITMS5_1@hfzcdb> SELECT * FROM t01 a RIGHT OUTER JOIN t02 b ON a.id1=b.id1;
ID1 ID2 ID3 ID1 ID4 ID5
---------- ---------- ---------- ---------- ---------- ----------
1 1 1 1 4 4
2 2 2 2 2 2
4 6 6
全外连接
ITMS5_1@hfzcdb> SELECT * FROM t01 a FULL OUTER JOIN t02 b ON a.id1=b.id1; # outer可以省略
ID1 ID2 ID3 ID1 ID4 ID5
---------- ---------- ---------- ---------- ---------- ----------
1 1 1 1 4 4
2 2 2 2 2 2
4 6 6
3 3 3
交叉连接(笛卡尔积)
SELECT * FROM t01 CROSS JOIN t02;ITMS5_1@hfzcdb> SELECT * FROM t01 CROSS JOIN t02;
ID1 ID2 ID3 ID1 ID4 ID5
---------- ---------- ---------- ---------- ---------- ----------
1 1 1 1 4 4
1 1 1 2 2 2
1 1 1 4 6 6
2 2 2 1 4 4
2 2 2 2 2 2
2 2 2 4 6 6
3 3 3 1 4 4
3 3 3 2 2 2
3 3 3 4 6 6
ITMS5_1@hfzcdb> select * from t01,t02;
ID1 ID2 ID3 ID1 ID4 ID5
---------- ---------- ---------- ---------- ---------- ----------
1 1 1 1 4 4
1 1 1 2 2 2
1 1 1 4 6 6
2 2 2 1 4 4
2 2 2 2 2 2
2 2 2 4 6 6
3 3 3 1 4 4
3 3 3 2 2 2
3 3 3 4 6 6
嵌套循环(Nested Loops):简称NL
- nestloop 适用于大小表关联 小表做外表(驱动表,放内存中),外表(被驱动表)每返回一行数据,内表需要做一次全表扫描,该场景下适合再内表的关联键上建立索引,避免内表的多次全表扫描
select *
from
t_vio_violation v1, t_vio_white_vehicle v2
where
v1.plate_nbr=v2.plate_nbr and v1.violation_time > sysdate -100
Plan Hash Value : 284138460
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 886 | 15 | 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 886 | 15 | 00:00:01 |
| 2 | NESTED LOOPS | | 33 | 886 | 15 | 00:00:01 |
| 3 | TABLE ACCESS FULL | T_VIO_WHITE_VEHICLE | 1 | 359 | 2 | 00:00:01 |
| * 4 | INDEX RANGE SCAN | IDX_PLATE_NBR_01 | 33 | | 2 | 00:00:01 |
| * 5 | TABLE ACCESS BY INDEX ROWID | T_VIO_VIOLATION | 8 | 4216 | 13 | 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 4 - access("V1"."PLATE_NBR"="V2"."PLATE_NBR")
* 5 - filter("V1"."VIOLATION_TIME">SYSDATE@!-100)
小表做外表(驱动表,放内存中)–T_VIO_WHITE_VEHICLE ,外表(被驱动表)–T_VIO_VIOLATION
遵循最上最右先执行的原则。
(归并)排序合并连接(Sort Merge Join):SMJ
merge join 因为要排序,因此性能要差于hash join,若关联键上有索性,性能也不错, 适用于关联键已有索引并且支持不等值连接 <= >=
【排序合并连接分为两个阶段】
1、Sort 阶段:两边集合按照连接字段进行排序。
2、Merge 阶段:排序好的两边集合进行相互合并(Merge)操作。
select /*+ ordered use_merge(v2) */*
from
t_vio_violation v1, t_vio_white_vehicle v2
where
v1.violation_time > sysdate -100
and
v1.plate_nbr=v2.plate_nbr
Plan Hash Value : 3221625541
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 886 | 523280 | 00:00:41 |
| 1 | MERGE JOIN | | 1 | 886 | 523280 | 00:00:41 |
| 2 | SORT JOIN | | 6026069 | 3175738363 | 523277 | 00:00:41 |
| * 3 | TABLE ACCESS FULL | T_VIO_VIOLATION | 6026069 | 3175738363 | 103631 | 00:00:09 |
| * 4 | SORT JOIN | | 1 | 359 | 3 | 00:00:01 |
| 5 | TABLE ACCESS FULL | T_VIO_WHITE_VEHICLE | 1 | 359 | 2 | 00:00:01 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 3 - filter("V1"."VIOLATION_TIME">SYSDATE@!-100)
* 4 - access("V1"."PLATE_NBR"="V2"."PLATE_NBR")
* 4 - filter("V1"."PLATE_NBR"="V2"."PLATE_NBR")
(散列)哈希连接(Hash Join):简称HJ
hash join 仅适用于等值关联,两表中较小的表的关联键放内存中做hash散列再去窥探大表,性能较好
【哈希连接分为两个阶段】
1、Build 阶段:读取小表(Build Input)生成Hash表。 —构造阶段
2、Probe 阶段:读取大表(Probe Input)探查Hash表并进行连接。 --探查阶段
select /*+ ordered use_hash(v2) */*
from
t_vio_violation v1, t_vio_white_vehicle v2
where
v1.violation_time > sysdate -100
and
v1.plate_nbr=v2.plate_nbr
Plan Hash Value : 1360093866
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 886 | 235876 | 00:00:19 |
| * 1 | HASH JOIN | | 1 | 886 | 235876 | 00:00:19 |
| * 2 | TABLE ACCESS FULL | T_VIO_VIOLATION | 6026027 | 3175716229 | 103631 | 00:00:09 |
| 3 | TABLE ACCESS FULL | T_VIO_WHITE_VEHICLE | 1 | 359 | 2 | 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 1 - access("V1"."PLATE_NBR"="V2"."PLATE_NBR")
* 2 - filter("V1"."VIOLATION_TIME">SYSDATE@!-100)
三种连接方式比较
NL连接 | sort Merge连接 | Hash连接 |
---|---|---|
海量数据连接慢 | 海量数据连接比较快 | 海量数据连接很快 |
特别依赖索引 | 不太依赖索引,有索引排序会快 | 不是太依赖索引,索引快速过滤出结果 |
随机方式扫描数据 | 不全是随机方式扫描数据 | 不全是随机方式扫描数据 |
从SGA的buffer cache读取数据 | 从PGA读取排序后的数据 | 从PGA读取Hash表数据 |
被驱动表需要扫描多次 | outer表与inner表都只扫描一次 | outer表与inner表都只扫描一次 |
不需要排序 | 需要排序的数据也是从buffer cache读取,不可避免 | 需要构建的数据也是从buffer cache读取,不可避免 |
两个表都要排序 | 不需要排序 | |
两个表都要放到PGA,使用大量PGA 空间 | 只把小表放在PGA中 |