hive开窗函数
窗口函数
数据准备
1 jx 20
2 zx 24
3 yx 18
4 wz 10
5 yy 34
6 wy 25
create table t (
> id int,
> name string,
> age int
> )
> row format delimited fields terminated by ' ';
load data inpath '/data/data.txt' into table t;
ROW_NUMBER
ROW_NUMBER 从1开始,生成分组内记录的数据
select id, name, age, row_number() over(order by age desc) num from t;
结果展示
id name age num
5 yy 34 1
6 wy 25 2
2 zx 24 3
1 jx 20 4
3 yx 18 5
4 wz 10 6
RANK 和 DENSE_RANK
RANK生成数据在分组中的排名,排名相等的会在名次中留下空位
DENSE_RANK 生成数据在分组中的排名,排名相等的不会留下空位
select
id, name, age,
rank() over(order by age desc) num1,
dense_rank() over(order by age desc) num2,
row_number() over(order by age desc) num3
from t;
结果展示(中途插入数据忘记覆盖原数据了,但是不影响展示结果)
id name age num1 num2 num3
5 yy 34 1 1 1
5 yy 34 1 1 2
6 wy 25 3 2 3
6 wy 25 3 2 4
2 zx 24 5 3 5
2 zx 24 5 3 6
1 jx 20 7 4 7
7 hn 20 7 4 8
1 jx 20 7 4 9
3 yx 18 10 5 10
3 yx 18 10 5 11
4 wz 10 12 6 12
4 wz 10 12 6 13
分析窗口函数
SUM
结果和order by相关,默认为升序
select id,name,age,sum(age)over(order by age) sum from t;
结果展示
id name age sum
4 wz 10 10
3 yx 18 28
1 jx 20 68
7 hn 20 68
2 zx 24 92
6 wy 25 117
5 yy 34 151
如果没有orger by 则默认将分区内所有的数据进行sum
select id,name,age,sum(age)over() sum from t;
结果展示
id name age sum
1 jx 20 151
2 zx 24 151
3 yx 18 151
4 wz 10 151
5 yy 34 151
6 wy 25 151
7 hn 20 151
如果不指定rows between,默认从起点到当前行
rows between的含义
- preceding : 往前
- following : 往后
- current row : 当前行
- unbounded : 起点
- unbounded preceding : 默认从前面的起点
- unbounded following : 默认到后面的终点
从起点到终点进行sum
select id, name, age, sum(age)over(order by age rows between unbounded preceding and current row) sum from t;
结果展示
id name age sum
4 wz 10 10
3 yx 18 28
1 jx 20 48
7 hn 20 68
2 zx 24 92
6 wy 25 117
5 yy 34 151
对前三行和本行和下一行进行sum
select id, name, age, sum(age)over(order by age rows between 3 preceding and 1 following) sum from t;
结果展示
id name age sum
4 wz 10 28
3 yx 18 48
1 jx 20 68
7 hn 20 92
2 zx 24 107
6 wy 25 123
5 yy 34 103
对当前行到终点进行sum
select id, name, age, sum(age)over(order by age rows between current row and unbounded following) sum from t;
结果展示
id name age sum
4 wz 10 151
3 yx 18 141
1 jx 20 123
7 hn 20 103
2 zx 24 83
6 wy 25 59
5 yy 34 34
其余还有avg、min、max和sum的用法一样
这里只再展示一个avg
select id,name,age,avg(age)over(order by age) sum from t;
结果展示
id name age sum
4 wz 10 10.0
3 yx 18 14.0
1 jx 20 17.0
7 hn 20 17.0
2 zx 24 18.4
6 wy 25 19.5
5 yy 34 21.571428571428573