Interval Merging Questions in SQL Interview Questions

Table of contents

0 needs

1 Data preparation

2 Data analysis

 2 Summary


0 needs

Given multiple time periods, each time period is divided into start time and end time, and multiple overlapping time periods are merged into one interval

--数据:id、开始时间、结束时间
1 12 15
2 57 58
3 29 32
4 30 31
5 17 19
6 44 44
7 56 57
8 16 18

The combined result is as follows;

--结果
flag	start_time	end_time
1	     12	        15
2	     16	        19
3	     29	        32
4	     44	        44
5	     56	        58

1 Data preparation

create table test02 as
select 1 as id, 12 as start_time, 15 as end_time
union all
select 2 as id, 57 as start_time, 58 as end_time
union all
select 3 as id, 29 as start_time, 32 as end_time
union all
select 4 as id, 30 as start_time, 31 as end_time
union all
select 5 as id, 17 as start_time, 19 as end_time
union all
select 6 as id, 44 as start_time, 44 as end_time
union all
select 7 as id, 56 as start_time, 57 as end_time
union all
select 8 as id, 16 as start_time, 18 as end_time

2 Data analysis

Problem-solving points: How to judge which intervals are to be merged?

In fact, from another perspective, which intervals are intersected and which are repeated?

Judgment idea: If the start time and end time are sorted, and the start time of the current row is less than or equal to the end time of the previous row, then the dates overlap and duplicate data exists. According to this condition, we can set a breakpoint, and then use the classic idea of ​​sum() over() to obtain the group id, and the problem will be solved.

Step 1: Sort in descending order according to the start time and end time, and get the end time of the previous row for comparison

select id,
                            start_time,
                            end_time,
                            lag(end_time, 1, end_time) over (order by start_time asc, end_time asc) as lga_end_time
                     from test02

 

Step 2: Judging according to lag_end_time, when the start_time of the current row <= lag_end_time, set the mark value to 0, otherwise it is 1 (the classic grouping idea after changing conditions, here must be set to 0 when the conditions are met, and not satisfied Set to 1 when condition)

select id
                    , start_time
                    , end_time
                    , case when start_time <= lga_end_time then 0 else 1 end as flg --条件成立的时候为0,不成立的时候为1
               from (select id,
                            start_time,
                            end_time,
                            lag(end_time, 1, end_time) over (order by start_time asc, end_time asc) as lga_end_time
                     from test02

 Step 3: Obtain the group id according to the method of sum() over()

 select id
              , start_time
              , end_time
              , sum(flg) over (order by start_time, end_time ) as grp_id
         from (select id
                    , start_time
                    , end_time
                    , case when start_time <= lga_end_time then 0 else 1 end as flg --条件成立的时候为0,不成立的时候为1
               from (select id,
                            start_time,
                            end_time,
                            lag(end_time, 1, end_time) over (order by start_time asc, end_time asc) as lga_end_time
                     from test02
                    ) t
              ) t

 Step 4: Obtain the minimum and maximum values ​​in the group, the minimum value is the starting point, the maximum value is the end point, and the group id is the id

The final SQL is as follows:

select grp_id + 1      as id
     , min(start_time) as start_time
     , max(end_time)   as end_time
from (
         select id
              , start_time
              , end_time
              , sum(flg) over (order by start_time, end_time ) as grp_id
         from (select id
                    , start_time
                    , end_time
                    , case when start_time <= lga_end_time then 0 else 1 end as flg --条件成立的时候为0,不成立的时候为1
               from (select id,
                            start_time,
                            end_time,
                            lag(end_time, 1, end_time) over (order by start_time asc, end_time asc) as lga_end_time
                     from test02
                    ) t
              ) t
     ) t
group by grp_id

 2 Summary

   This question is about interval merging. The problem is quite classic. The core idea of ​​judgment is to construct the condition:

The start time of the current row <= the end time of the previous row (sorted in descending order by start time and end time).

Then use the classic grouping idea to find the minimum and maximum values ​​in a group.

Guess you like

Origin blog.csdn.net/godlovedaniel/article/details/126662876