本文基于hive1.1.0-cdh5.12.1
示例数据如下:
userid |
date |
a |
2021-01-01 |
a |
2021-01-02 |
a |
2021-01-03 |
a |
2021-01-05 |
a |
2021-01-06 |
b |
2021-01-01 |
b |
2021-01-04 |
要求:求每个用户每天连续登录的天数
参考结果:
userid |
date |
continuous_days |
a |
2021-01-01 |
3 |
a |
2021-01-02 |
3 |
a |
2021-01-03 |
3 |
a |
2021-01-05 |
2 |
a |
2021-01-06 |
2 |
b |
2021-01-01 |
1 |
b |
2021-01-04 |
1 |
参考实现:
with t as (
select 'a' as userid,to_date('2021-01-01') AS date union all
select 'a' as userid,to_date('2021-01-02') AS date union all
select 'a' as userid,to_date('2021-01-03') AS date union all
select 'a' as userid,to_date('2021-01-05') AS date union all
select 'a' as userid,to_date('2021-01-06') AS date union all
select 'b' as userid,to_date('2021-01-01') AS date union all
select 'b' as userid,to_date('2021-01-04') AS date
)
select userid,date,rn,login_group,count(*) over(partition by userid,login_group) as continuous_days
from
(
select userid,date,rn,date_sub(date,rn) as login_group --这个相减是关键,将连续登录的行划分为同一组
from
(
select userid,date,row_number() over(partition by userid order by date) as rn from t
) t1
) t2
;
+---------+-------------+-----+--------------+------------------+--+
| userid | date | rn | login_group | continuous_days |
+---------+-------------+-----+--------------+------------------+--+
| a | 2021-01-01 | 1 | 2020-12-31 | 3 |
| a | 2021-01-02 | 2 | 2020-12-31 | 3 |
| a | 2021-01-03 | 3 | 2020-12-31 | 3 |
| a | 2021-01-05 | 4 | 2021-01-01 | 2 |
| a | 2021-01-06 | 5 | 2021-01-01 | 2 |
| b | 2021-01-04 | 2 | 2021-01-02 | 1 |
| b | 2021-01-01 | 1 | 2020-12-31 | 1 |
+---------+-------------+-----+--------------+------------------+--+