数据仓库 流失用户数

需求:求出最近7天未登陆的流失用户数。

依赖数据:dws_uv_detail_day(每日活跃设备明细表)

建表语句:

drop table if exists ads_wastage_count;
create external table ads_wastage_count( 
    `dt` string COMMENT '统计日期',
    `wastage_count` bigint COMMENT '流失设备数'
) 
row format delimited fields terminated by '\t'
location '/warehouse/gmall/ads/ads_wastage_count';

数据导入脚本:

巧用group和having,先对日活安设备id分组,分组后,登陆日期最近(最大的)都小于7天前,就是我们要的数据。

#!/bin/bash

if [ -n "$1" ];then
    do_date=$1
else
    do_date=`date -d "-1 day" +%F`
fi

hive=/opt/module/hive/bin/hive
APP=gmall

echo "-----------导入日期$do_date-----------"

sql="
insert into table "$APP".ads_wastage_count
select
     '$do_date',
     count(*)
from 
(
    select mid_id
    from "$APP".dws_uv_detail_day
    group by mid_id
    having max(dt)<=date_add('$do_date',-7)
)t1;
"

$hive -e "$sql"

猜你喜欢

转载自www.cnblogs.com/noyouth/p/13207831.html