目录
描述: 1225.报告系统状态的连续信息
数据准备:
分析:
代码:
总结:
描述: 1225.报告系统状态的连续信息
表:
Failed
+--------------+---------+ | Column Name | Type | +--------------+---------+ | fail_date | date | +--------------+---------+ 该表主键为 fail_date (具有唯一值的列)。 该表包含失败任务的天数.表:
Succeeded
+--------------+---------+ | Column Name | Type | +--------------+---------+ | success_date | date | +--------------+---------+ 该表主键为 success_date (具有唯一值的列)。 该表包含成功任务的天数.系统 每天 运行一个任务。每个任务都独立于先前的任务。任务的状态可以是失败或是成功。
编写解决方案找出 2019-01-01 到 2019-12-31 期间任务连续同状态
period_state
的起止日期(start_date
和end_date
)。即如果任务失败了,就是失败状态的起止日期,如果任务成功了,就是成功状态的起止日期。最后结果按照起始日期
start_date
排序返回结果样例如下所示:
示例 1:
输入: Failed table: +-------------------+ | fail_date | +-------------------+ | 2018-12-28 | | 2018-12-29 | | 2019-01-04 | | 2019-01-05 | +-------------------+ Succeeded table: +-------------------+ | success_date | +-------------------+ | 2018-12-30 | | 2018-12-31 | | 2019-01-01 | | 2019-01-02 | | 2019-01-03 | | 2019-01-06 | +-------------------+ 输出: +--------------+--------------+--------------+ | period_state | start_date | end_date | +--------------+--------------+--------------+ | succeeded | 2019-01-01 | 2019-01-03 | | failed | 2019-01-04 | 2019-01-05 | | succeeded | 2019-01-06 | 2019-01-06 | +--------------+--------------+--------------+ 解释: 结果忽略了 2018 年的记录,因为我们只关心从 2019-01-01 到 2019-12-31 的记录 从 2019-01-01 到 2019-01-03 所有任务成功,系统状态为 "succeeded"。 从 2019-01-04 到 2019-01-05 所有任务失败,系统状态为 "failed"。 从 2019-01-06 到 2019-01-06 所有任务成功,系统状态为 "succeeded"。
数据准备:
Create table If Not Exists Failed (fail_date date)
Create table If Not Exists Succeeded (success_date date)
Truncate table Failed
insert into Failed (fail_date) values ('2018-12-28')
insert into Failed (fail_date) values ('2018-12-29')
insert into Failed (fail_date) values ('2019-01-04')
insert into Failed (fail_date) values ('2019-01-05')
Truncate table Succeeded
insert into Succeeded (success_date) values ('2018-12-30')
insert into Succeeded (success_date) values ('2018-12-31')
insert into Succeeded (success_date) values ('2019-01-01')
insert into Succeeded (success_date) values ('2019-01-02')
insert into Succeeded (success_date) values ('2019-01-03')
insert into Succeeded (success_date) values ('2019-01-06')
分析:
① 首先先加一列状态列 同时union all连接 两张表
select success_date date, 'succeeded' as state from Succeeded union all select *, 'failed' as failed from Failed②根据日期排序 同时筛选数据
with t1 as ( select success_date date, 'succeeded' as state from Succeeded union all select *, 'failed' as failed from Failed) select date, state from t1 where date between '2019-01-01' and '2019-12-31' order by date③根据状态分组 根据日期排名
with t1 as (select success_date date, 'succeeded' as state from Succeeded union all select *, 'failed' as failed from Failed) , t2 as ( select date, state from t1 where date between '2019-01-01' and '2019-12-31' order by date) select *, row_number() over (partition by state order by date) r1 from t2④ 构造差值 date 减去r1 求一个辅助日期 如果辅助日期相同 说明是连续的
with t1 as (select success_date date, 'succeeded' as state from Succeeded union all select *, 'failed' as failed from Failed) , t2 as ( select date, state from t1 where date between '2019-01-01' and '2019-12-31' order by date) , t3 as ( select *, row_number() over (partition by state order by date) r1 from t2) select *, date_sub(date, interval r1 day) r2 from t3 order by date⑤ 根据状态state和辅助列r2分组 根据日期排序 求出 每组最小的/第一个日期 和 最大的/最后一个日期
select distinct state period_state, first_value(date) over (partition by state,r2 order by date) start_date, max(date) over (partition by state,r2 ) end_date # last_value(date) over (partition by state,r2 order by date rows between unbounded preceding and unbounded following ) end_date # 提供两种方法 from t4 order by start_date
代码:
with t1 as (select success_date date, 'succeeded' as state
from Succeeded
union all
select *, 'failed' as failed
from Failed)
, t2 as (select date, state
from t1
where date between '2019-01-01' and '2019-12-31'
order by date)
, t3 as (select *, row_number() over (partition by state order by date) r1
from t2)
, t4 as (select *, date_sub(date, interval r1 day) r2
from t3
order by date)
select distinct state period_state,
first_value(date) over (partition by state,r2 order by date) start_date,
max(date) over (partition by state,r2 ) end_date
# last_value(date) over (partition by state,r2 order by date rows between unbounded preceding and unbounded following ) end_date
from t4
order by start_date;
总结:
①最后求end_date 时用last_value就会出错 换了一种写法用的max
②碰到日期 求最大 最小 可以优先考虑max min函数
③注意排序 不然数据多的时候 会出现错乱
④first_value 取第一个值 注意排序
⑤last_value 取最后一个值 它默认范围是
rows between unbounded preceding and current row
要想使用它 需要重新设置范围 如下
order by date rows between unbounded preceding and unbounded following