SQL进阶day11——窗口函数

news2024/10/6 20:25:36

目录

1专用窗口函数

1.1 每类试卷得分前3名

 1.2第二快/慢用时之差大于试卷时长一半的试卷

1.3连续两次作答试卷的最大时间窗

1.4近三个月未完成试卷数为0的用户完成情况

1.5未完成率较高的50%用户近三个月答卷情况

2聚合窗口函数

2.1 对试卷得分做min-max归一化

2.2每份试卷每月作答数和截止当月的作答总数。

2.3 每月及截止当月的答题情况

1专用窗口函数

1.1 每类试卷得分前3名

我的代码:筛选好难,不懂啥意思

select tag tid,uid,rank()over(partition by tag order by score desc ) ranking
from examination_info ei join exam_record er
on ei.exam_id = er.exam_id
limit 3

正确代码:

select *
from (select tag tid,
uid,
rank()over(partition by tag order by max(score) desc,min(score) desc,max(uid) desc) ranking
from examination_info ei join exam_record er
on ei.exam_id = er.exam_id
group by tag,uid)t
where ranking<=3

复盘:

(1)排序:如果两人最大分数相同,选择最小分数大者,如果还相同,选择uid大者

ORDER BY MAX(score) desc ,MIN(score) desc,uid desc

(2)窗口函数

【排序窗口函数】

●   rank()over()——1,1,3,4

●   dense_rank()over()——1,1,2,3

●   row_number()over()——1,2,3,4

 1.2第二快/慢用时之差大于试卷时长一半的试卷

我的代码:没搞出来,好久没有弄窗口了,这个题好难

方法1:max(if)

select a.exam_id,b.duration,b.release_time  
from
(select exam_id,
row_number() over(partition by exam_id order by timestampdiff(second,start_time,submit_time) desc) rn1,
row_number() over(partition by exam_id order by timestampdiff(second,start_time,submit_time) asc ) rn2,
timestampdiff(second,start_time,submit_time) timex
from exam_record 
where score is not null) a

inner join examination_info b on a.exam_id=b.exam_id
group by a.exam_id
#if(rn1=2,a.timex,0)后最大值肯定是第二位的a.timex了
having (max(if(rn1=2,a.timex,0))-max(if(rn2=2,a.timex,0)))/60>b.duration/2 
order by a.exam_id desc

方法2:分析(窗口)函数:NTH_VALUE

select distinct c.exam_id,duration,release_time from 
(select a.exam_id, 
nth_value(TIMESTAMPDIFF(minute,start_time,submit_time),2) over (partition by exam_id order by TIMESTAMPDIFF(minute,start_time,submit_time) desc ) as low_2,
nth_value(TIMESTAMPDIFF(minute,start_time,submit_time),2) over (partition by exam_id order by TIMESTAMPDIFF(minute,start_time,submit_time) asc) as fast_2,
duration,release_time
from exam_record a left join examination_info b on a.exam_id = b.exam_id) c 
where low_2-fast_2>duration*0.5
order by exam_id desc;

复盘:

(1)时间差函数:timestampdiff,如计算差多少分钟,timestampdiff(minute,时间1,时间2),是时间2-时间1,单位是minute

(2)如何取次最大和次最小呢:分析(窗口)函数:NTH_VALUE

NTH_VALUE (measure_expr, n) [ FROM { FIRST | LAST } ][ { RESPECT | IGNORE } NULLS ] OVER (analytic_clause)

(3)关于窗口函数,才发现我本地的数据库连接版本是5,只有MySQL8以上才能用窗口函数好像,所以不能在本地演练推导了。(我一点也不想升级,安装都很麻烦,升级的话肯定各种报错)

1.3连续两次作答试卷的最大时间窗

我的思路:(写不出来)

(1)先把每个用户作答时间用dateformat求出来

(2)在作差,应该可以用偏移分析函数:

【偏移分析函数】

●   lag(字段名,偏移量[,默认值])over()——当前行向取值“偏移量”行

●   lead(字段名,偏移量[,默认值])over()——当前行向取值“偏移量”行

例:

●      ,confirmed 当天截至时间累计确诊人数

●      ,lag(confirmed,1)over(partition by name order by whn) 昨天截至时间累计确诊人数

●      ,(confirmed - lag(confirmed,1)over(partition by name order by whn)) 每天新增确诊人数

 (3)然后选取最大的这个差值

正确代码:

select 
    uid,
    max(datediff(next_time,start_time))+1 as days_window,
    round(count(start_time)/(datediff(max(start_time),min(start_time))+1)*(max(datediff(next_time,start_time))+1),2)as avg_exam_cnt
from(
    select 
        uid,
        start_time,
        lead(start_time,1) over(partition by uid order by start_time) as next_time
    from exam_record
    where year(start_time) = '2021'
    )a
group by uid
having count(distinct date(start_time)) > 1
order by days_window desc,avg_exam_cnt desc

复盘:

(1)先找出uid, 开始时间,下次开始时间。条件是2021创建子表

下次开始时间用偏移分析函数:

●   lead(字段名,偏移量[,默认值])over()——当前行向取值“偏移量”行

select 
    uid,
    start_time,
    lead(start_time,1) over(partition by uid order by start_time) as next_time
from exam_record
where year(start_time) = '2021'

(2)最大时间窗口 = max(datediff(next_time,start_time))+1

(3)平均做答试卷套数=作答的试卷数 / 作答期间 *最大时间窗口

= 3/7*6

= count(start_time)/

(datediff(max(start_time),min(start_time))+1)

*(max(datediff(next_time,start_time))+1)

= round(count(start_time)/

(datediff(max(start_time),min(start_time))+1)

*(max(datediff(next_time,start_time))+1),2) #保留两位小数

(4)时间作差要用时间差函数datediff,不能直接相减:结果会是不一样的

(5)datediff()函数 与 timestampdiff()函数的区别

//语法
DATEDIFF(datepart,startdate,enddate)


 SELECT DATEDIFF('2018-05-09 08:00:00','2018-05-09') AS DiffDate;
 //结果 0 ; 表示 2018-05-09 与 2018-05-09之间没有日期差。这里是不比较时分秒的。下面验证带上时分秒有没有差别。
 SELECT DATEDIFF('2018-05-09 00:00:00','2018-05-09 23:59:59') AS DiffDate;
 //结果 0 ;
 SELECT DATEDIFF('2018-05-08 23:59:59','2018-05-09 00:00:00') AS DiffDate;
 //结果 -1;
 SELECT DATEDIFF('2018-05-09 00:00:00','2018-05-08 23:59:59') AS DiffDate;
//结果 1;

 

1.4近三个月未完成试卷数为0的用户完成情况

我的代码:思路是这样,报错是必然的

# 先按照uid划分,找出都完成了的,
select uid,
rank()over(partition by uid order by start_time) exam_complete
from exam_record
group by uid
having count(start_time) = count(submit_time) #不对,这样不是每个uid的count

# 再按照时间划分,找出3个以上的
select uid,
count(exam_complete) exam_complete_cnt
from 
(select uid,
rank()over(partition by uid order by start_time) exam_complete
from exam_record
group by uid
having count(start_time) = count(submit_time))a
where exam_complete_cnt>3

大佬代码:发现这个答案和我的好像,我再改改

select 
    uid,
    count(start_time) as exam_complete_cnt
from
    (select 
        *,
        dense_rank() over(partition by uid order by date_format(start_time,'%Y%m') desc) as ranking
    from exam_record
    ) a
where ranking <= 3    -- 这里也不能用where ranking <= 3 and submit_time is not null,而要将用户分组后,用having判断
group by uid
having count(score) =  count(uid)
order by exam_complete_cnt desc, uid desc

我的代码改正:

select uid,
count(start_time) as exam_complete_cnt
from
(select *, #后面要用到start_time和submit_time,select也要用到uid,用*全部返回吧
dense_rank()over(partition by uid order by date_format(start_time,"%Y%m") desc) ranking
from exam_record)a
where ranking <=3 #把前面3个月的都要进行计数
group by uid
having count(start_time) = count(submit_time)
order by exam_complete_cnt desc, uid desc

复盘:

(1)这里不能用rank,加引号也不行,难道是和函数名重复了?改为ranking就好了

(2)窗口函数,等着二刷吧,有点小难

1.5未完成率较高的50%用户近三个月答卷情况

我的代码:思路是这样,报错是必然的

# 先筛选出SQL试卷上,未完成率较高的50%用户,6级和7级用户
select *,count(er.submit_time)/count(er.start_time) complete_rate,
rank()over(partition by u.uid order by complete_rate) ranking
from examination_info ei join exam_record er
on ei.exam_id = er.exam_id
join user_info u on u.uid = er.uid
group by u.uid
having ei.tag = 'SQL' and u.level in (6,7) and ranking<0.5
# 子表用户在有试卷作答记录的近三个月中,每个月的答卷数目和完成数目
# 完整代码:
select
uid,
start_month,
count(start_time) total_cnt,
count(submit_time) complete_cnt
from(select *,count(er.submit_time)/count(er.start_time) complete_rate,
rank()over(partition by u.uid order by complete_rate) ranking,
dense_rank()over(partition by uid order by date_format(submit_time,"%Y%m") desc) rankingmonth
from examination_info ei join exam_record er
on ei.exam_id = er.exam_id
join user_info u on u.uid = er.uid
group by u.uid
having ei.tag = 'SQL' and u.level in (6,7) and ranking<0.5)a
where rankingmonth <=3
group by date_format(submit_time,"%Y%m")

大佬代码:好牛,我啥时候能这个水平

# 第一步,先找出未完成率前50%高的用户ID,注意这里需要的sql试卷
with rote_tab as 
(select t.uid,t.f_rote,row_number()over(order by t.f_rote desc,uid) as rank2
,count(t.uid)over(partition by t.tag)as cnt
from (select er.uid,ef.tag,(sum(if(submit_time is null,1,0))/count(start_time)) as f_rote
from exam_record er left join examination_info ef 
on ef.exam_id=er.exam_id 
where tag='SQL' 
group by uid ) t)

select  #第四步,分用户和月份进行数据统计;同时需要注意,统计的试卷数是所有类型的,不是之前仅有SQL类型
    uid
    ,start_month
    ,count(start_time) as total_cnt
    ,count(submit_time) as complete_cnt
from 
(
select # 第三步,利用窗口函数对每个用户的月份进行降序排序,以便找出最近的三个月;
    uid
    ,start_time
    ,submit_time
    ,date_format(start_time,'%Y%m') as start_month
    ,dense_rank()over(partition by uid order by date_format(start_time,'%Y%m') desc) as rank3
from exam_record 
where uid in 
    (select distinct er.uid
    from exam_record er left join user_info uf on uf.uid=er.uid
    where er.uid in 
    (select uid from rote_tab #引用公用表 rote_tab
    where rank2<=round(cnt/2,0))
    and uf.level in (6,7))  # 第二步,进一步找出满足等级为6或7的用户ID
) t2
where rank3<=3
group by uid,start_month
order by uid,start_month

2聚合窗口函数

2.1 对试卷得分做min-max归一化

我的报错代码:(得分区间默认为[0,100],如果某个试卷作答记录中只有一个得分,那么无需使用公式,归一化并缩放后分数仍为原分数)这个怎么筛选出去呀

select er.uid,er.exam_id,
(score-min(score))/(max(score)-min(score)) avg_new_score
from examination_info ei join exam_record er
using(exam_id)
where difficulty = 'hard'
group by er.uid,er.exam_id
order by er.uid desc,avg_new_score desc

 大佬代码:

# 第一步先求出高难度试卷的最值max_min_tab
with max_min_tab as 
(select  er.uid,er.exam_id,er.score
    ,max(er.score)over(partition by er.exam_id) as max_score
    ,min(er.score)over(partition by er.exam_id) as min_score
from exam_record er 
left join examination_info ef on er.exam_id=ef.exam_id
where score is not null and difficulty='hard')

select uid,exam_id, #第三步进行取平均值和排序
round(avg(new_score)) as avg_new_score
from 
(select uid,exam_id
,if(max_score!=min_score,(score-min_score)/(max_score-min_score)*100,score) as new_score
from max_min_tab) t  # 第二步在max_min_tab中进行归一化计算,并用if排除只有一个分数的
group by exam_id,uid
order by exam_id,avg_new_score desc

复盘:

(1)最值窗口函数:不是直接max,min再后面分组

max(er.score)over(partition by er.exam_id) as max_score,

min(er.score)over(partition by er.exam_id) as min_score

(2)用if来排除只有一个分数的情况

if(max_score!=min_score,(score-min_score)/(max_score-min_score)*100,score

2.2每份试卷每月作答数和截止当月的作答总数。

我的代码:

select exam_id,
date_format(submit_time,"%Y%m") start_month,
count(submit_time)over(partition by exam_id) month_cnt,
count(submit_time)over(partition by exam_id) cum_exam_cnt #应该要用偏移分析函数
from exam_record

 大佬代码:

select distinct exam_id,
date_format(start_time,'%Y%m') start_month,
count(start_time)over(partition by exam_id,date_format(start_time,"%Y%m")) month_cnt,
count(start_time)over(partition by exam_id order by date_format(start_time,'%Y%m')) cum_exam_cnt 
from exam_record
order by exam_id,start_month

复盘:

(1)要distinct exam_id,如果不去重 exam_id,那么同 exam_id和同月会被输出原文件中exam_id和同月配套出现那么多次。

如:

又如:

(2)是start_time而不是submit_time,start_time有记录才表明有作答

2.3 每月及截止当月的答题情况

我的代码:后面三个没有整出来

select 
distinct date_format(start_time,'%Y%m') start_month,
count(uid)over(partition by date_format(start_time,'%Y%m')) mau,
# if(count>0,count,0) month_add_uv,
# max(month_add_uv)over(partition by date_format(start_time,'%Y%m')) max_month_add_uv,
# max(mau)over() cum_sum_uv
from exam_record
group by uid,start_month
order by start_month

大佬代码:

select 
  start_month
, count(distinct uid) as mau
, count(if(rn=1, uid, null)) as month_add_uv
, max(count(if(rn=1, uid, null))) over(order by start_month) as max_month_add_uv
, sum(count(if(rn=1, uid, null))) over(order by start_month) as cum_sum_uv
from (
    select
      uid, date_format(start_time, '%Y%m') as start_month
    , row_number() over(partition by uid order by start_time) as rn
    from exam_record
) t
group by start_month
;

复盘:

(1)【排序窗口函数】

●   rank()over()——1,1,3,4

●   dense_rank()over()——1,1,2,3

●   row_number()over()——1,2,3,4

这里使用 row_number()over()就只有一个1,那么如果uid有排名为1的,就表示是这个月的新用户。

(2)

  • SQL查询语句语法结构和运行顺序
    • 运行顺序:from--where--group by--having--order by--limit--select
    • 语法结构:select--from--where--group by--having--order by--limit

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1796161.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

贪心算法学习一

例题一 解法&#xff08;贪⼼&#xff09;&#xff1a; 贪⼼策略&#xff1a; 分情况讨论&#xff1a; a. 遇到 5 元钱&#xff0c;直接收下&#xff1b; b. 遇到 10 元钱&#xff0c;找零 5 元钱之后&#xff0c;收下&#xff1b; c. 遇到 20 元钱&#xff1a…

LabVIEW齿轮调制故障检测系统

LabVIEW齿轮调制故障检测系统 概述 开发了一种基于LabVIEW平台的齿轮调制故障检测系统&#xff0c;实现齿轮在恶劣工作条件下的故障振动信号的实时在线检测。系统利用LabVIEW的强大图形编程能力&#xff0c;结合Hilbert包络解调技术&#xff0c;对齿轮的振动信号进行精确分析…

【Linux取经路】网络套接字编程——TCP篇

文章目录 前言十、Tcp Server 端代码10.1 socket、bind10.1 listen——监听一个套接字10.2 accept——获取一个新连接10.3 read——从套接字中读取数据10.4 write——向套接字中进行写入10.5 Tcp Service 端完整代码&#xff08;单进程版&#xff09;10.6 Tcp Server 端代码&am…

C++ 11【右值引用】

&#x1f493;博主CSDN主页:麻辣韭菜&#x1f493;   ⏩专栏分类&#xff1a;C修炼之路⏪   &#x1f69a;代码仓库:C高阶&#x1f69a;   &#x1f339;关注我&#x1faf5;带你学习更多C知识   &#x1f51d;&#x1f51d; 1.C 11 简介 目录 1.C 11 简介 2. 统一的列表…

JVM之【类的生命周期】

首先&#xff0c;请区分Bean的声明周期和类的声明周期。此处讲的是类的声明周期 可以同步观看另一篇文章JVM之【类加载机制】 概述 在Java中数据类型分为基本数据类型和引用数据类型 基本数据类型由虚拟机预先定义&#xff0c;引用数据类型则需要进行类的加载 按照]ava虚拟机…

如何打造不一样的景区文旅VR体验馆项目?

近年来影院类产品迅速火爆&#xff0c;市面上的产品越来越多&#xff0c;投资者可以说是挑花了眼。为了助力投资者实现持续盈利&#xff0c;今天来给大家分析目前普乐蛙大爆新品悬空球幕飞行影院与其他5D/7D影院有哪些区别&#xff0c;给大家的创业投资之路避避雷~ 那我们正式开…

Android精通值Fragment的使用 —— 不含底层逻辑(五)

文章目录 1. Fragment1.1 Fragment的特性1.2 Fragment的基本使用步骤1.3 动态添加Fragment基本步骤1.4 Fragment与Activity的通信原生方案&#xff1a;Bundle类深入方案&#xff1a;java类与类通信的方案&#xff1a;接口Activity从Fragment获取消息Fragment从Activity获取消息…

德国80%的统计学教授都会答错的6个与P值有关的问题!

小编阅读了一篇发表于2002年关于P值的一项问卷调查研究 [1]&#xff0c;作者在6所德国大学中邀请了3组不同的受试者&#xff0c;分别为: 心理学专业的学生(n 44)&#xff1b;主要从事科学研究但不进行统计相关教学的教授和讲师(n 39)&#xff1b;进行统计相关教学的教授和讲师…

05-控制流(分支结构)

05-控制流(分支结构) 一、二路分支 程序中某一段代码需要满足一定的条件才会被执行。 if 语句&#xff1a;用于表达一种条件&#xff0c;如果条件满足则执行某个代码块。if-else 语句&#xff1a;用于表达一种条件&#xff0c;如果条件满足则执行某个代码块&#xff0c;否则…

微信小程序bindgetphonenumber获取手机号阻止冒泡触发

问题&#xff1a;点击手机号弹出微信的手机号验证组件&#xff0c;这是可以的。但是我点击车牌号&#xff0c;也弹出来了&#xff0c;这就郁闷了。 以下是解决方法 点击手机号时&#xff0c;弹出选择手机号 解决&#xff1a; <view style"display: flex;justify-conte…

Facebook开户|Facebook广告设计与测试优化

早上好家人们~今天Zoey给大家伙带来的是Facebook广告设计与测试优化&#xff0c;需要的家人们看过来啦&#xff01; 一、避免复杂用图和过多的文字 根据Facebook的数据显示&#xff0c;用户平均浏览一个贴文的时间在手机上仅花1.7秒、在电脑上则为2.5秒。因此&#xff0c;广告…

Java1.8 vue版家政服务系统成品源码 家政管家系统源码 家政月嫂系统源码 家政保洁系统源码 在线派单,师傅入驻全套商业源码

Java1.8 vue版家政服务系统成品源码 家政管家系统源码 家政月嫂系统源码 家政保洁系统源码 在线派单&#xff0c;师傅入驻全套商业源码 一、系统定义 家政上门服务系统是一种利用互联网技术&#xff0c;将家政服务需求与专业的家政服务人员进行高效匹配的平台。它允许用户通过…

LeetCode-704. 二分查找【数组 二分查找】

LeetCode-704. 二分查找【数组 二分查找】 题目描述&#xff1a;解题思路一&#xff1a;注意开区间和闭区间背诵版&#xff1a;解题思路三&#xff1a; 题目描述&#xff1a; 给定一个 n 个元素有序的&#xff08;升序&#xff09;整型数组 nums 和一个目标值 target &#xf…

Partially Spoofed Audio Detection论文介绍(ICASSP 2024)

An Efficient Temporary Deepfake Location Approach Based Embeddings for Partially Spoofed Audio Detection 论文翻译名&#xff1a;一种基于部分欺骗音频检测的基于临时深度伪造位置方法的高效嵌入 摘要&#xff1a; 部分伪造音频检测是一项具有挑战性的任务&#xff0…

水电表自动抄表系统

1.简述 水电表自动抄表系统是一种现代化智能化管理系统&#xff0c;它利用先进的物联网&#xff0c;完成了远程控制、即时、零接触的水电表读值收集&#xff0c;大大提升了公共事业服务项目的效率和准确性。该系统不仅减少了人工抄表工作量&#xff0c;还避免了人为失误&#…

【NOIP2020普及组复赛】题3:方格取数

题3&#xff1a;方格取数 【题目描述】 设有 nm 的方格图&#xff0c;每个方格中都有一个整数。现有一只小熊&#xff0c;想从图的左上角走到右下角&#xff0c;每一步只能向上、向下或向右走一格&#xff0c;并且不能重复经过已经走过的方格&#xff0c;也不能走出边界。小熊…

神经网络搭建(1)----nn.Sequential

神经网络模型构建 采用CIFAR10中的数据&#xff0c;并对其进行简单的分类。以下图为例 输入&#xff1a;3通道&#xff0c;3232 ( 经过一个55的卷积) → 变成32通道&#xff0c;3232的图像 (经过22的最大池化) → 变成32通道&#xff0c;1616的图像 ( 经过一个55的卷积) → 变…

电商售后常见的客服快捷语

在电商行业&#xff0c;优质的客户服务体验是留住顾客、建立品牌信誉的关键。面对多样化的售后请求&#xff0c;如何高效、准确地回应顾客&#xff0c;成为每个客服团队必须面对的挑战。今天&#xff0c;我给大家分享一些电商售后常见的客服快捷语&#xff0c;帮助客服人员提高…

AIGC 011-SAM第一个图像分割大模型-分割一切!

AIGC 011-SAM第一个图像分割大模型-分割一切&#xff01; 文章目录 0 论文工作1论文方法2 效果 0 论文工作 这篇论文介绍了 Segment Anything (SA) 项目&#xff0c;这是一个全新的图像分割任务、模型和数据集。SA 项目是一个具有里程碑意义的工作&#xff0c;它为图像分割领域…

网络安全:https劫持

文章目录 参考https原理https窃听手段SSL/TLS降级原理难点缺点 SSL剥离原理发展缺点前端劫持 MITM攻击透明代理劫持 参考 https原理 SNI 浏览器校验SSL证书 https降级 https握手抓包解析 lets encrypt申请证书 https原理 步骤如下&#xff1a; 客户端向服务器发送https请求。…