- 数据和目标
有如下数据存储了各组件的有效期间(此处起止日期用数字代替以便查阅),目标为将有重叠的期间合并到一起。
SQL> SELECT * FROM demo_eff_periods;
COMPONENT_ITEM_ID EFFECTIVITY_DATE DISABLE_DATE
----------------- ---------------- ------------
1 1000 1999
1 1500 3000
1 3000 3999
1 4000 4999
2 1000 4999
2 2000 3999
3 1000 1999
3 1500 2999
3 1000 3999
4 1000 1999
4 2000 2999
4 3000 3999
5 1000 1999
5 1000 1999
5 4000 4999
15 rows selected
将数据用下图表示,除组件1的第二段区间为闭区间,其它均为左闭右开。
- 例程一:转化为连续日期合并
先使用递归语法把每段期间拆解出具体的每一天,再将连续的日期整合为新区间。
这里使用dense_rank是因为可能有重复的日期出现。
如果出现上一期间的结束日期与下一期间的开始日期刚好连续的情况,这个方法将会把这两个区间合并到一起。
SQL> SELECT component_item_id,
2 MIN(new_date) effectivity_date,
3 MAX(new_date) disable_date
4 FROM (SELECT dep.component_item_id,
5 dep.effectivity_date + LEVEL - 1 new_date,
6 dense_rank() over(PARTITION BY dep.component_item_id ORDER BY dep.effectivity_date + LEVEL) dr
7 FROM demo_eff_periods dep
8 CONNECT BY LEVEL + dep.effectivity_date - 1 <= dep.disable_date
9 AND PRIOR dep.rowid = dep.rowid
10 AND PRIOR sys_guid() IS NOT NULL)
11 GROUP BY component_item_id, new_date - dr;
COMPONENT_ITEM_ID EFFECTIVITY_DATE DISABLE_DATE
----------------- ---------------- ------------
1 1000 4999
2 1000 4999
3 1000 3999
4 1000 3999
5 1000 1999
5 4000 4999
6 rows selected
- 例程二:递归串联区间合并
使用RSF语法,先找到所有不被其它区间包含的期间开始日期,再递归串联后续区间,最终得以合并。
将SQL中注释的部分解除,则可实现如例程一的进一步合并连续区间的效果。
SQL> WITH rsf_eff_periods(component_item_id, effectivity_date, disable_date) AS
2 (SELECT dep.component_item_id, dep.effectivity_date, dep.disable_date
3 FROM demo_eff_periods dep
4 WHERE NOT EXISTS
5 (SELECT 1
6 FROM demo_eff_periods dep0
7 WHERE dep0.component_item_id = dep.component_item_id
8 AND dep0.effectivity_date < dep.effectivity_date
9 AND dep0.disable_date >= dep.effectivity_date /*- 1*/)
10 UNION ALL
11 SELECT rep.component_item_id, rep.effectivity_date, dep.disable_date
12 FROM rsf_eff_periods rep, demo_eff_periods dep
13 WHERE rep.component_item_id = dep.component_item_id
14 AND rep.disable_date >= dep.effectivity_date /*- 1*/
15 AND rep.disable_date < dep.disable_date)
16 SELECT component_item_id, effectivity_date, MAX(disable_date) disable_date
17 FROM rsf_eff_periods
18 GROUP BY component_item_id, effectivity_date;
COMPONENT_ITEM_ID EFFECTIVITY_DATE DISABLE_DATE
----------------- ---------------- ------------
1 1000 3999
1 4000 4999
2 1000 4999
3 1000 3999
4 1000 1999
4 2000 2999
4 3000 3999
5 1000 1999
5 4000 4999
9 rows selected
CONNECT BY写法:
SELECT component_item_id, effectivity_date, MAX(disable_date) disable_date
FROM (SELECT dep.component_item_id,
connect_by_root effectivity_date effectivity_date,
dep.disable_date
FROM demo_eff_periods dep
START WITH NOT EXISTS
(SELECT 1
FROM demo_eff_periods dep0
WHERE dep0.component_item_id = dep.component_item_id
AND dep0.effectivity_date < dep.effectivity_date
AND dep0.disable_date >= dep.effectivity_date /*- 1*/
)
CONNECT BY PRIOR dep.component_item_id = dep.component_item_id
AND PRIOR dep.disable_date >= dep.effectivity_date /*- 1*/
AND PRIOR dep.disable_date < dep.disable_date)
GROUP BY component_item_id, effectivity_date;
- 例程三:区间端点排序重组
分别找到不被其它期间包含的开始日期和结束日期,即合并后应得的区间端点,两类端点(开始和结束)分别排序,同位次的端点即组成新区间。
这里使用dense_rank是因为可能有重复的日期出现。
将SQL中注释的部分解除,则可实现如例程一的进一步合并连续区间的效果。
SQL> SELECT component_item_id,
2 MIN(new_date) effectivity_date,
3 MAX(new_date) disable_date
4 FROM (SELECT dep.component_item_id,
5 dep.effectivity_date new_date,
6 dense_rank() over(PARTITION BY dep.component_item_id ORDER BY dep.effectivity_date) dr
7 FROM demo_eff_periods dep
8 WHERE NOT EXISTS
9 (SELECT 1
10 FROM demo_eff_periods dep0
11 WHERE dep0.component_item_id = dep.component_item_id
12 AND dep0.effectivity_date < dep.effectivity_date
13 AND dep0.disable_date >= dep.effectivity_date /*- 1*/)
14 UNION ALL
15 SELECT dep.component_item_id,
16 dep.disable_date new_date,
17 dense_rank() over(PARTITION BY dep.component_item_id ORDER BY dep.disable_date) dr
18 FROM demo_eff_periods dep
19 WHERE NOT EXISTS
20 (SELECT 1
21 FROM demo_eff_periods dep0
22 WHERE dep0.component_item_id = dep.component_item_id
23 AND dep0.effectivity_date <= dep.disable_date /*+ 1*/
24 AND dep0.disable_date > dep.disable_date))
25 GROUP BY component_item_id, dr;
COMPONENT_ITEM_ID EFFECTIVITY_DATE DISABLE_DATE
----------------- ---------------- ------------
1 1000 3999
1 4000 4999
2 1000 4999
3 1000 3999
4 1000 1999
4 2000 2999
4 3000 3999
5 1000 1999
5 4000 4999
9 rows selected
- 例程四:区间端点排序重组(减少扫描表次数)
对于每一个原始期间,如有比自己早开始的期间其结束日期比自己的开始日期还大,则自己的开始日期将被吞并;如有比自己晚结束的期间其开始日期比自己的结束日期还小,则自己的结束日期将被吞并。
筛选出合并后应得的区间端点后,两类端点(开始和结束)分别排序,同位次的端点即组成新区间。
这里使用dense_rank是因为可能有重复的日期出现。
将SQL中注释的部分解除,则可实现如例程一的进一步合并连续区间的效果。
SQL> SELECT component_item_id,
2 MIN(new_date) effectivity_date,
3 MAX(new_date) disable_date
4 FROM (SELECT component_item_id,
5 new_date,
6 dense_rank() over(PARTITION BY component_item_id, date_pos ORDER BY new_date) dr
7 FROM (SELECT dep.component_item_id,
8 CASE
9 WHEN MAX(dep.disable_date)
10 over(PARTITION BY dep.component_item_id ORDER BY
11 dep.effectivity_date
12 RANGE BETWEEN unbounded preceding AND 1
13 preceding) >= dep.effectivity_date /*- 1*/ THEN
14 NULL
15 ELSE
16 dep.effectivity_date
17 END AS effectivity_date,
18 CASE
19 WHEN MIN(dep.effectivity_date)
20 over(PARTITION BY dep.component_item_id ORDER BY
21 dep.disable_date DESC
22 RANGE BETWEEN unbounded preceding AND 1
23 preceding) <= dep.disable_date /*+ 1*/ THEN
24 NULL
25 ELSE
26 dep.disable_date
27 END AS disable_date
28 FROM demo_eff_periods dep)
29 UNPIVOT(new_date FOR date_pos IN(effectivity_date, disable_date)))
30 GROUP BY component_item_id, dr;
COMPONENT_ITEM_ID EFFECTIVITY_DATE DISABLE_DATE
----------------- ---------------- ------------
1 1000 3999
1 4000 4999
2 1000 4999
3 1000 3999
4 1000 1999
4 2000 2999
4 3000 3999
5 1000 1999
5 4000 4999
9 rows selected
更多方法,欢迎大家讨论