Table of contents
0 needs
Given multiple time periods, each time period is divided into start time and end time, and multiple overlapping time periods are merged into one interval
--数据:id、开始时间、结束时间
1 12 15
2 57 58
3 29 32
4 30 31
5 17 19
6 44 44
7 56 57
8 16 18
The combined result is as follows;
--结果
flag start_time end_time
1 12 15
2 16 19
3 29 32
4 44 44
5 56 58
1 Data preparation
create table test02 as
select 1 as id, 12 as start_time, 15 as end_time
union all
select 2 as id, 57 as start_time, 58 as end_time
union all
select 3 as id, 29 as start_time, 32 as end_time
union all
select 4 as id, 30 as start_time, 31 as end_time
union all
select 5 as id, 17 as start_time, 19 as end_time
union all
select 6 as id, 44 as start_time, 44 as end_time
union all
select 7 as id, 56 as start_time, 57 as end_time
union all
select 8 as id, 16 as start_time, 18 as end_time
2 Data analysis
Problem-solving points: How to judge which intervals are to be merged?
In fact, from another perspective, which intervals are intersected and which are repeated?
Judgment idea: If the start time and end time are sorted, and the start time of the current row is less than or equal to the end time of the previous row, then the dates overlap and duplicate data exists. According to this condition, we can set a breakpoint, and then use the classic idea of sum() over() to obtain the group id, and the problem will be solved.
Step 1: Sort in descending order according to the start time and end time, and get the end time of the previous row for comparison
select id,
start_time,
end_time,
lag(end_time, 1, end_time) over (order by start_time asc, end_time asc) as lga_end_time
from test02
Step 2: Judging according to lag_end_time, when the start_time of the current row <= lag_end_time, set the mark value to 0, otherwise it is 1 (the classic grouping idea after changing conditions, here must be set to 0 when the conditions are met, and not satisfied Set to 1 when condition)
select id
, start_time
, end_time
, case when start_time <= lga_end_time then 0 else 1 end as flg --条件成立的时候为0,不成立的时候为1
from (select id,
start_time,
end_time,
lag(end_time, 1, end_time) over (order by start_time asc, end_time asc) as lga_end_time
from test02
Step 3: Obtain the group id according to the method of sum() over()
select id
, start_time
, end_time
, sum(flg) over (order by start_time, end_time ) as grp_id
from (select id
, start_time
, end_time
, case when start_time <= lga_end_time then 0 else 1 end as flg --条件成立的时候为0,不成立的时候为1
from (select id,
start_time,
end_time,
lag(end_time, 1, end_time) over (order by start_time asc, end_time asc) as lga_end_time
from test02
) t
) t
Step 4: Obtain the minimum and maximum values in the group, the minimum value is the starting point, the maximum value is the end point, and the group id is the id
The final SQL is as follows:
select grp_id + 1 as id
, min(start_time) as start_time
, max(end_time) as end_time
from (
select id
, start_time
, end_time
, sum(flg) over (order by start_time, end_time ) as grp_id
from (select id
, start_time
, end_time
, case when start_time <= lga_end_time then 0 else 1 end as flg --条件成立的时候为0,不成立的时候为1
from (select id,
start_time,
end_time,
lag(end_time, 1, end_time) over (order by start_time asc, end_time asc) as lga_end_time
from test02
) t
) t
) t
group by grp_id
2 Summary
This question is about interval merging. The problem is quite classic. The core idea of judgment is to construct the condition:
The start time of the current row <= the end time of the previous row (sorted in descending order by start time and end time).
Then use the classic grouping idea to find the minimum and maximum values in a group.