电商数仓中hive常用的函数

常用函数

concat 函数
concat_ws 函数
STR_TO_MAP 函数
collect_set 函数
nvl 函数
日期处理函数
综合训练

concat 函数

concat 函数在连接字符串的时候，只要其中一个是 NULL，那么将返回 NULL

hive> select concat('a','b');
ab

hive> select concat('a','b',null);
NULL

concat_ws 函数

concat_ws 函数在连接字符串的时候，只要有一个字符串不是 NULL，就不会返回 NULL。
concat_ws 函数需要指定分隔符。

hive> select concat_ws('-','a','b');
a-b

hive> select concat_ws('-','a','b',null);
a-b

hive> select concat_ws('','a','b',null);
ab

STR_TO_MAP 函数

（1）语法描述
STR_TO_MAP(VARCHAR text, VARCHAR listDelimiter, VARCHAR keyValueDelimiter)
（2）功能描述
使用 listDelimiter 将 text 分隔成 K-V 对，然后使用 keyValueDelimiter 分隔每个 K-V 对，组装成 MAP 返回。默认 listDelimiter 为（，），keyValueDelimiter 为（=）。
（3）案例

str_to_map('1001=2020-03-10,1002=2020-03-10', ',', '=')
输出
{"1001":"2020-03-10","1002":"2020-03-10"}

collect_set 函数

创建原数据表

create table stud (name string, area string, course string, score
int);

向原数据表中插入数据

insert into table stud values('zhang3','bj','math',88);
insert into table stud values('li4','bj','math',99);
insert into table stud values('wang5','sh','chinese',92);
insert into table stud values('zhao6','sh','chinese',54);
insert into table stud values('tian7','bj','chinese',91);

查询表中数据

hive (default)>select * from stud;

把同一分组的不同行的数据聚合成一个集合

hive (default)>select course, collect_set(area), avg(score) from stud group by course;

chinese	["sh","bj"]	79.0
math	["bj"]	    93.5

用下标可以取某一个

hive (default)>select course, collect_set(area)[0], avg(score) from
stud group by course;

chinese	sh	79.0
math	bj	93.5

nvl 函数

基本语法
NVL（表达式 1，表达式 2）
如果表达式 1 为空值，NVL 返回值为表达式 2 的值，否则返回表达式 1 的值。该函数的目的是把一个空值（null）转换成一个实际的值。其表达式的值可以是数字型、字符型和日期型。但是表达式 1 和表达式 2 的数据类型必须为同一个类型。

日期处理函数

（1）date_format 函数（根据格式整理日期）

hive (default)> select date_format('2020-07-29','yyyy-MM');

2020-07

（2）date_add 函数（加减日期）

hive (default)> select date_add('2020-07-29',-1);
2020-07-28
hive (default)> select date_add('2020-07-29',+1);
2020-07-30

（3）next_day 函数
取当前天的下一个周一

hive (default)> select next_day('2020-07-12','MO');
2020-07-13

说明：星期一到星期日的英文（Monday，Tuesday、Wednesday、Thursday、Friday、Saturday、Sunday）
取当前周的周一

hive (default)> select date_add(next_day('2020-07-12','MO'),-7);
2020-07-06

last_day 函数（求当月最后一天日期）

hive (default)> select last_day('2020-07-12');
2020-07-31

综合训练

需求
在这里插入图片描述
在/export下创建data
内容

3210	1001	2020-03-10 00:00:00.0
3211	1001	2020-03-10 00:00:00.0
3212	1001	2020-03-10 00:00:00.0
3210	1002	2020-03-10 00:00:00.0
3211	1002	2020-03-10 00:00:00.0
3212	1002	2020-03-10 00:00:00.0
3210	1005	2020-03-10 00:00:00.0
3211	1004	2020-03-10 00:00:00.0
3212	1004	2020-03-10 00:00:00.0

在hive创建一张表

create table test(
 id int,
 status int,
 ts string
 )
row format delimited fields terminated by '\t';

向表中装载数据（Load）

load data local inpath '/export/data' into table test;

操作步骤

hive (default)> select id,concat(status,"=",ts) from test; 

3210	1001=2020-03-10 00:00:00.0
3211	1001=2020-03-10 00:00:00.0
3212	1001=2020-03-10 00:00:00.0
3210	1002=2020-03-10 00:00:00.0
3211	1002=2020-03-10 00:00:00.0
3212	1002=2020-03-10 00:00:00.0
3210	1005=2020-03-10 00:00:00.0
3211	1004=2020-03-10 00:00:00.0
3212	1004=2020-03-10 00:00:00.0

select id,collect_set(concat(status,"=",ts)) from test group by id;

3210	["1001=2020-03-10 00:00:00.0","1002=2020-03-10 00:00:00.0","1005=2020-03-10 00:00:00.0"]
3211	["1001=2020-03-10 00:00:00.0","1002=2020-03-10 00:00:00.0","1004=2020-03-10 00:00:00.0"]
3212	["1001=2020-03-10 00:00:00.0","1002=2020-03-10 00:00:00.0","1004=2020-03-10 00:00:00.0"]

select id,concat_ws(',',collect_set(concat(status,"=",ts))) from test group by id;

3210	1001=2020-03-10 00:00:00.0,1002=2020-03-10 00:00:00.0,1005=2020-03-10 00:00:00.0
3211	1001=2020-03-10 00:00:00.0,1002=2020-03-10 00:00:00.0,1004=2020-03-10 00:00:00.0
3212	1001=2020-03-10 00:00:00.0,1002=2020-03-10 00:00:00.0,1004=2020-03-10 00:00:00.0

select id,str_to_map(concat_ws(',',collect_set(concat(status,"=",ts)))) from test group by id;

3210	{"1001=2020-03-10 00":"00:00.0","1002=2020-03-10 00":"00:00.0","1005=2020-03-10 00":"00:00.0"}
3211	{"1001=2020-03-10 00":"00:00.0","1002=2020-03-10 00":"00:00.0","1004=2020-03-10 00":"00:00.0"}
3212	{"1001=2020-03-10 00":"00:00.0","1002=2020-03-10 00":"00:00.0","1004=2020-03-10 00":"00:00.0"}