开窗函数定义
开窗函数:用于为行定义一个窗口,它一组值进行操作,不需要使用group by子句对数据进行分组,能够在同一行中同时返回基础行的列和聚合列。
划重点!!!
开窗函数返回:基础行列、聚合列
下面通过例子看一下。
数据示例
0、需求
#基础数据
id name sex age
1 a 男 20
2 b 女 10
3 c 男 30
4 d 男 15
5 e 女 22
#实现
id name sex age 聚合列
1 a 男 20 5
2 b 女 10 5
3 c 男 30 5
4 d 男 15 5
5 e 女 22 5
1、建表语句
CREATE TABLE `student` (
`id` int NOT NULL,
`name` varchar(255) DEFAULT NULL,
`sex` varchar(255) DEFAULT NULL,
`age` int DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
2、数据准备
INSERT INTO `test`.`student` (`id`, `name`, `sex`, `age`) VALUES (1, 'a', '男', 20);
INSERT INTO `test`.`student` (`id`, `name`, `sex`, `age`) VALUES (2, 'b', '女', 10);
INSERT INTO `test`.`student` (`id`, `name`, `sex`, `age`) VALUES (3, 'c', '男', 30);
INSERT INTO `test`.`student` (`id`, `name`, `sex`, `age`) VALUES (4, 'd', '男', 15);
INSERT INTO `test`.`student` (`id`, `name`, `sex`, `age`) VALUES (5, 'e', '女', 22);
3、不适用开窗方式实现
select
student.*,
student_total.cnt
from
student
join
(select count(*) as cnt from student) student_total
4、使用开窗方式实现
实战
实战一:本地素材库
所需模型:
video(id,create_time,signature)
需求一:video(本地素材库)去重
select min(id) from video group by signature
需求二:video(本地素材库)去重,且获取第一次上传时的id
不开窗的的方式:
SELECT
video.id,
video.created_at,
video.signature
FROM
video
JOIN ( SELECT signature, min( created_at ) as created_at FROM video GROUP BY signature ) tmp
ON video.signature = tmp.signature and video.created_at = tmp.created_at
开窗的方式:
SELECT
*
FROM
(
SELECT
video.id,
video.created_at,
video.signature,
row_number () over ( PARTITION BY signature ORDER BY created_at ) AS rn
FROM
video
) tmp
WHERE
tmp.rn = 1
需求二:video(本地素材库)去重,且获取第二次上传时的id
不开窗的的方式:
???
开窗的方式:注意对比上个开窗的区别
SELECT
*
FROM
(
SELECT
video.id,
video.created_at,
video.signature,
row_number () over ( PARTITION BY signature ORDER BY created_at ) AS rn
FROM
video
) tmp
WHERE
tmp.rn = 2
实战二:分段求和与累计求和
#累计求和
#id 时段 分段消耗 累计消耗
#1 0 10 10
#1 0 20 30
#1 0 30 60
#分段求和
select
customerId,
stat_hour,
sum(costFormat) as total
from
micro.etl_xm_campaign_charge_day_hour
where stat_date = "2021-11-30" and customerId = 35702
group by customerId,stat_hour
order by stat_hour
#累计求和
select
customerId,
stat_hour,
sum(costFormat) over(partition by customerId order by stat_hour rows between unbounded preceding and current row) as total
from
(
select
customerId,
stat_hour,
sum(costFormat) as costFormat
from
micro.etl_xm_campaign_charge_day_hour
where stat_date = "2021-11-30" and customerId = 35702
group by customerId,stat_hour
)t
order by stat_hour
拓展——级联求和
级联集合:在同一个行中需要展示分段求和+累计求和
select
tmp2.userId,
tmp2.month,
tmp2.total_month,
sum(tmp2.total_month) over(partition by userId order by tmp2.month rows between unbounded preceding and current row)
from
(select
tmp1.userId,
tmp1.month,
sum(tmp1.visitCount) as total_month
from
(select
userId,
date_format(replace(visitDate,"/","-"),"yyyy-MM") as month,
visitCount
from
visit
)tmp1
group by tmp1.userId,tmp1.month
)tmp2
;
划重点:over()里面rows的范围
知识点——开窗函数分类
- 排名开窗函数
- row_number:为每一组的行按顺序生成一个连续序号
- dense_rank:为每一组的行按顺序生成一个连续序号,区别相同值相同序号,并且接下来序号连续。如:1,1,2。
- rank:为每一组的行按顺序生成一个连续序号,区别相同值相同序号,并且接下来序号+1。如:1,1,3。
- 聚合开窗函数
- sum、avg、max、min、count