引言
场景介绍:
许多互联网平台为了提高用户的参与度和忠诚度,会推出各种连续登录奖励机制。例如,游戏平台会给连续登录的玩家发放游戏道具、金币等奖励;学习类 APP 会为连续登录学习的用户提供积分,积分可兑换课程或其他福利。通过这些激励措施,平台希望用户能够养成持续使用产品的习惯,从而提升产品的活跃度和留存率。同时,对于平台运营者来说,分析用户的连续登录数据可以了解用户的使用习惯和忠诚度,进而优化产品功能和运营策略。
题目描述:
假设我们有一个记录用户登录信息的表,表名为 login_table,其中包含两个字段:uid(用户 ID)和 dt(登录日期)。现在需要完成以下三个任务:
- 查询连续登录超过三天的用户:找出在一段时间内,连续登录天数大于三天的用户列表。这有助于平台识别出那些高度活跃且对产品有较高忠诚度的用户,以便进一步进行精细化运营和奖励。
- 查询每个用户连续登录的最大天数:对于每个用户,统计其在所有登录记录中连续登录的最长时间段,这能帮助我们了解不同用户的活跃程度差异,为个性化运营提供数据基础。
- 查询一个用户连续登录的最大天数(可隔一天):在计算用户连续登录天数时,允许中间间隔一天,只要整体登录天数最多,就是我们要找的结果。比如用户在 1、3、5、6 日登录,那么其连续登录的最大天数为 6 天。这种统计方式可以更灵活地评估用户的活跃程度,考虑到了用户可能因为某些特殊情况中断一天登录,但整体仍保持较高的使用频率。
数据准备与代码实现
数据准备
1 2025-01-01
1 2025-01-02
1 2025-01-03
2 2025-01-07
2 2025-01-08
3 2025-01-09
3 2025-01-10
3 2025-01-12
3 2025-01-13
1. 查询连续登录超过三天的用户
思路:
- 用户登录记录编号:利用
row_number()
函数按uid
分区并依dt
升序排序生成序号rn
,实现对各用户登录时间进行排序编号 - 计算连续登录首日:利用
date_add
函数将dt
减去rn
,计算每行对应的连续登录起始日期first_day
。 - 选出连续登录超过三天大用户:利用
group by
按uid
和first_day
分组,结合having
筛选出分组行数大于等于3的记录,实现找出连续登录超三天的用户uid
。
with data as (
select 1 as uid,'2025-01-01' as dt union all
select 1 as uid,'2025-01-02' as dt union all
select 1 as uid,'2025-01-03' as dt union all
select 2 as uid,'2025-01-07' as dt union all
select 2 as uid,'2025-01-08' as dt union all
select 3 as uid,'2025-01-09' as dt union all
select 3 as uid,'2025-01-10' as dt union all
select 3 as uid,'2025-01-12' as dt union all
select 3 as uid,'2025-01-13' as dt
),
data2 as (
select uid,dt,row_number() over (partition by uid order by dt) rn from data
),
data3 as (
select uid,dt,rn,date_add(dt,-rn) as first_day from data2
)
select uid from data3 group by uid,first_day having count(1) >= 3;
2. 查询每个用户连续登录的最大天数
思路:
- 用户登录记录编号:利用窗口函数
row_number()
,按uid
分区并依dt
升序排序生成序号rn
,实现对各用户登录时间进行排序编号。 - 计算连续登录首日:利用
date_add
函数将dt
减去rn
,计算每行对应的连续登录起始日期first_day
。 - 统计分组登录天数:利用
group by
按uid
和first_day
分组,通过count(*)
统计同一组合的天数login_day
,以此统计出每个用户每段连续登录的天数。 - 获取用户最大连续登录天数:再次使用
group by
对uid
进行分组,通过max(login_day)
从每个用户的多段连续登录天数中选出最大值,最终得到每个用户连续登录的最大天数。
with data as (
select 1 as uid,'2025-01-01' as dt union all
select 1 as uid,'2025-01-02' as dt union all
select 1 as uid,'2025-01-03' as dt union all
select 2 as uid,'2025-01-07' as dt union all
select 2 as uid,'2025-01-08' as dt union all
select 3 as uid,'2025-01-09' as dt union all
select 3 as uid,'2025-01-10' as dt union all
select 3 as uid,'2025-01-12' as dt union all
select 3 as uid,'2025-01-13' as dt
),
data2 as (
select uid,dt,row_number() over (partition by uid order by dt) rn from data
),
data3 as (
select uid,dt,rn,date_add(dt,-rn) as first_day from data2
),
data4 as (
select uid,first_day,count(*) as login_day from data3 group by uid,first_day
)
select uid,max(login_day) from data4 group by uid;
3. 查询一个用户连续登录的最大天数,可以隔一天。解释:1、3、5、6登录则最大登录天数为6天。
思路:
- 查找上次登录时间:利用
lag
函数按uid
分区并依dt
升序排序,实现获取每行记录的上一次登录时间prev_dt
。 - 打标判断连续登录:利用
datediff
函数计算dt
与prev_dt
的时间差,根据差值情况打标flag
,实现区分是否连续登录,如果差值小于2天或者null(表示第一天)标记为0,都则标记为1。 - 计算连续登录标识和:利用
sum
函数按uid
分组并依dt
升序对flag
求和,生成sum_flag
,实现标识连续登录段。 - 计算每组时间差值:利用
datediff
函数对uid
和sum_flag
聚类分组后计算max(dt)
与min(dt)
的差值,实现获取每个分组的时间跨度。 - 获取最大连续登录天数:利用分组和
max
函数选出每个用户的最大时间差值max(diff)+1
,实现得到每个用户连续登录的最大天数max_login
。
核心点:将相差值小于等于2的分到同一组里,然后采用分段思想计算每个分组分段的天数即为连续登录的天数。
with data as (
select 1 as uid,'2025-01-01' as dt union all
select 1 as uid,'2025-01-02' as dt union all
select 1 as uid,'2025-01-04' as dt union all
select 2 as uid,'2025-01-07' as dt union all
select 2 as uid,'2025-01-08' as dt union all
select 2 as uid,'2025-01-11' as dt union all
select 2 as uid,'2025-01-13' as dt union all
select 2 as uid,'2025-01-15' as dt union all
select 3 as uid,'2025-01-09' as dt union all
select 3 as uid,'2025-01-10' as dt union all
select 3 as uid,'2025-01-12' as dt union all
select 3 as uid,'2025-01-15' as dt
),
data2 as (
select uid,dt,lag(dt, 1) over (partition by uid order by dt) prev_dt from data
),
data3 as (
select uid,dt,prev_dt,if(datediff(dt, prev_dt) <= 2 or datediff(dt, prev_dt) is null, 0 ,1) flag from data2
),
data4 as (
select uid,dt,prev_dt,flag,sum(flag) over(partition by uid order by dt) as sum_flag from data3
),
data5 as (
select uid,datediff(max(dt),min(dt)) diff from data4 group by uid,sum_flag
)
select uid,max(diff)+1 as max_login from data5 group by uid;
知识点总结
1.窗口函数:lag、row_number
https://blog.csdn.net/Ahuuua/article/details/127136611
基本语法:函数名(参数) OVER (PARTITION BY 子句 ORDER BY 子句 ROWS/RANGE子句)
- 函数名:如sum、max、min、count、avg等聚合函数以及lead、lag行比较函数等;
- over: 关键字,表示前面的函数是分析函数,不是普通的集合函数;
- 分组子句:over关键字后面挂号内的内容
lag()比较窗口函数
lag/lead(arg1,arg2,arg3):其中arg1为列名;arg2为偏移值,不能为负,默认为1;arg3超出记录窗口时的默认值,当不指定默认值时,则为null。lag:向前取n行; lead:向后取n行
row_number()排序窗口函数
排序窗口函数的主要作用是为查询结果中的每一行数据生成一个唯一的行号。这个行号是基于特定的排序规则生成的,并且可以根据不同的分组条件进行独立编号。
rank | row_number | dense_rank | |
---|---|---|---|
100 | 1 | 1 | 1 |
100 | 1 | 2 | 1 |
90 | 3 | 3 | 2 |
2. 日期计算函数
日期的三种形式:
DATE
:YYYY-MM-DD,CURRENT_DATE()DATETIME
:YYYY-MM-DD HH:MM:SS、CURRENT_TIMESTAMP()TIMESTAMP
:时间戳,1973-12-30 15:30:00为19731230153000,UNIX_TIMESTAMP()
常见计算函数:DATEDIFF(end,start)
:计算end-start,单位天数TIMESTAMPDIFF(unit,start,end)
:计算end-start,单位unit- unit:second、minute、hour、day、week、month、quarter(季度)、year
DATE_ADD(date, num)
:计算date+num后的时间,num参数表示要增加的时间间隔数量,正数表示增加时间,负数表示减少时间。
select CURRENT_DATE(),CURRENT_TIMESTAMP(),UNIX_TIMESTAMP();