思维导图
判断函数
if(expr,v1,v2): 表达式结果为true返回v1,否则返回v2
ifnull(列名,dv): 列值为null返回dv,否则返回列值.
nullif(expr1,expr2): 表达式1=表达式2返回null,不等于返回表达式1的值.
窗口函数
作用: 可以为表新增一列,新增的列是什么取决于over()函数前面的函数.
主要函数是: over(parttion by 列名 order by 列名 [窗口范围])
结合函数:
1.聚合函数:sum(),count(),avg(),max(),min()
2.序号函数:rank(),row_number(),dense_rank()
3.分布函数:percent_rank(),cume_dist()
4.前后函数:lag(),lead()
5.头尾函数:first_value(),last_value()
6.其他函数:nth_value(),ntile()
窗口范围:
rows between 起始行 and 结束行
[
rows between unbound preceding and unbound following #最大窗口
rows between 1 preceding and 1 following # 往上1行和往下一行.
rows between 2 preceding and current row #往上2行到当前行
rows between unbound preceding and current row #往上无边界到当前.
rows between current row and unbound following #当前行到往下无边界
]
序号函数
row_number(): 按照 1,2,3,4 标记序号
rank(): 按照 1,2,2,4 标记序号
dense_rank(): 按照 1,2,2,3 标记序号
- 需求 分组后排序
select
e.dname,
e.ename,
e.salary,
row_number() over (partition by e.dname order by e.salary) as rn -- 1234[5,6,6,7]
from employee e ;
select
e.dname,
e.ename,
e.salary,
rank() over (partition by e.dname order by e.salary) as rn -- 1224[5,6,6,7]
from employee e ;
select
e.dname,
e.ename,
e.salary,
dense_rank() over (partition by e.dname order by e.salary) as rn-- 1223[5,6,6,7]
from employee e ;
聚合函数结合
聚合函数:sum(),count(),avg(),max(),min()
窗口大小:
1.如果只写over(): 此时窗口是第一行到最后一行(最大窗口)
2.如果写了over(partition by 列) : 此时窗口大小是分组内的第一行到最后一行
3.如果写了over(partition by 列 order by 列 ): 窗口大小是分组内当前行到往上无边界
4.如果不满意上方的默认窗口,怎么办??? :
通过over(partition by dname order by hiredate rows between 1 preceding and 1 following)指定窗口.
select
dname,
ename,
salary,
sum(salary) over(partition by dname order by hiredate rows between 1 preceding and 1 following) as pv1,
count(1) over(partition by dname order by hiredate rows between unbounded preceding and unbounded following) as pv2,
avg(salary) over (partition by dname) as pv3
from employee;
前后函数
lag(显示的列,往前n条,找不到默认值)
lead(显示的列,往后n条,找不到默认值)
select
dname,
ename,
hiredate,
salary,
lag(hiredate,1,'默认时间') over(partition by dname order by hiredate) as last_1_time,
lag(hiredate,2) over(partition by dname order by hiredate) as last_2_time
from employee;
---------------------------------------------------------
select
dname,
ename,
hiredate,
salary,
lead(hiredate,1,'默认时间') over(partition by dname order by hiredate) as last_1_time,
lead(hiredate,2) over(partition by dname order by hiredate) as last_2_time
from employee;
首尾函数
-- todo 首尾函数----------first_value默认窗口是:分组内当前行到往上无边界-------------
-- todo 首尾函数----------last_value默认窗口是:分组内当前行到往上无边界-------------
-- 注意, 如果不指定ORDER BY,则进行排序混乱,会出现错误的结果
-- 需求1 : 截止到当前入职的人员.按照日期排序查询第1个入职和最后1个入职员工的薪资
select
dname,
ename,
hiredate,
salary,
first_value(salary) over(partition by dname order by hiredate) as first,
last_value(salary) over(partition by dname order by hiredate) as last
from employee;
-- 需求1 : 统计各部门全部数据.按照日期排序查询第1个入职和最后1个入职员工的薪资
select
dname,
ename,
hiredate,
salary,
first_value(salary) over(partition by dname order by hiredate) as first,
last_value(salary) over(partition by dname order by hiredate rows between unbounded preceding and unbounded following) as last
from employee;
其它函数
NTH_VALUE(expr,n): 用途:返回窗口中第n个expr的值。expr可以是表达式,也可以是列名
NTILE(n):用途:将分区中的有序数据分为n个等级,记录等级数
-- todo nth_value(列,第几条数据)
-- 查询每个部门截止目前薪资排在第二和第三的员工信息
select
dname,
ename,
hiredate,
salary,
nth_value(salary,2) over(partition by dname order by hiredate) as second_score,
nth_value(salary,3) over(partition by dname order by hiredate) as third_score
from employee;
-- todo ntile()根据入职日期将每个部门的员工分成3组
select
dname,
ename,
hiredate,
salary,
ntile(3) over(partition by dname order by hiredate ) as rn
from employee;