筛选条件在on和where中的区别（基于hivesql）

news2025/7/15 2:38:10

理解筛选条件在on和where中的区别，最好先理解sql的执行顺序，尽管实际执行时不同的物理执行引擎可能会有特定的优化，但是逻辑执行顺序必须遵循：

1）from：确定数据源是什么，from后可以是单表，可以是多表的join操作。
2）where：对from的数据源进行筛选。
3）group by：对where筛选后的结果分组。
4）having：对group by分组后的结果进行过滤（注意此时过滤的数据粒度不是单条记录，而是整个分组）。
5）select：选择结果字段（窗口函数的执行时机也在此时）。
6）order by：对select的结果执行排序。
7）limit：限制最终的结果数量。

所以从执行顺序可以看出筛选条件放在where和on中，最本质的区别是过滤的时机不同，on中的过滤发生在join的过程中，影响的参与join的记录，而where中的过滤是对join之后的结果集进行过滤。

筛选条件放在on中：

drop view if exists employees;
create temporary view employees as
select 1 as emp_id, '张三' as name, 10 as dept_id
union all
select 2, '李四', 20
union all
select 3, '王五', null;

drop view if exists departments;
create temporary view departments as
select 10 as dept_id, '技术部' as dept_name, 'active' as status
union all
select 20, '市场部', 'inactive'
union all
select 30, '财务部', 'active';

select 
  *
from employees e
left join departments d on e.dept_id=d.dept_id and d.status='active';

在这里插入图片描述
保留了左表的全部记录，逻辑上和先从右表筛选，拿筛选后的结果和左表关联的效果一样。

放在where中：

drop view if exists employees;
create temporary view employees as
select 1 as emp_id, '张三' as name, 10 as dept_id
union all
select 2, '李四', 20
union all
select 3, '王五', null;

drop view if exists departments;
create temporary view departments as
select 10 as dept_id, '技术部' as dept_name, 'active' as status
union all
select 20, '市场部', 'inactive'
union all
select 30, '财务部', 'active';

select 
  *
from employees e
left join departments d on e.dept_id=d.dept_id where d.status='active';