复杂sql-按【连续时间】分组(连续【登录】天数问题)

本文基于hive1.1.0-cdh5.12.1

示例数据如下:

userid

date

a

2021-01-01

a

2021-01-02

a

2021-01-03

a

2021-01-05

a

2021-01-06

b

2021-01-01

b

2021-01-04

要求:求每个用户每天连续登录的天数

参考结果:

userid

date

continuous_days

a

2021-01-01

3

a

2021-01-02

3

a

2021-01-03

3

a

2021-01-05

2

a

2021-01-06

2

b

2021-01-01

1

b

2021-01-04

1

参考实现:

with t as (
select 'a' as userid,to_date('2021-01-01') AS date union all
select 'a' as userid,to_date('2021-01-02') AS date union all
select 'a' as userid,to_date('2021-01-03') AS date union all
select 'a' as userid,to_date('2021-01-05') AS date union all
select 'a' as userid,to_date('2021-01-06') AS date union all
select 'b' as userid,to_date('2021-01-01') AS date union all
select 'b' as userid,to_date('2021-01-04') AS date 
)

select userid,date,rn,login_group,count(*) over(partition by userid,login_group) as continuous_days
from 
(
  select userid,date,rn,date_sub(date,rn) as login_group --这个相减是关键,将连续登录的行划分为同一组
  from 
  (
    select userid,date,row_number() over(partition by userid order by date) as rn from t
  ) t1
) t2
;

+---------+-------------+-----+--------------+------------------+--+
| userid  |    date     | rn  | login_group  | continuous_days  |
+---------+-------------+-----+--------------+------------------+--+
| a       | 2021-01-01  | 1   | 2020-12-31   | 3                |
| a       | 2021-01-02  | 2   | 2020-12-31   | 3                |
| a       | 2021-01-03  | 3   | 2020-12-31   | 3                |
| a       | 2021-01-05  | 4   | 2021-01-01   | 2                |
| a       | 2021-01-06  | 5   | 2021-01-01   | 2                |
| b       | 2021-01-04  | 2   | 2021-01-02   | 1                |
| b       | 2021-01-01  | 1   | 2020-12-31   | 1                |
+---------+-------------+-----+--------------+------------------+--+

猜你喜欢

转载自blog.csdn.net/cakecc2008/article/details/118294729
今日推荐