一、group by操作后将其他字段串接

select uid,concat_ws('|', collect_set(device)) from tmp_test group by uid;

collect_set 是 Hive 内置的一个聚合函数, 它返回一个消除了重复元素的对象集合, 其返回值类型是 array
Hive group by操作后将其他字段串接

mysql中使用的是group_concat(字段，‘分隔符’)

//第一种是正确用法
select id,group_concat(字段 separator ';') from aa group by id;  

//该种方法放回的分隔符会带上,分隔符，最终分隔符是;,
select id,group_concat(字段,';') from aa group by id;

https://www.cnblogs.com/fpcing/p/10452245.html
https://www.iteye.com/blog/amosjiayou-2274542

二、求collect_set()数组内的大小

用size( collect_set(***) ) 求内部数组大小

三、group by 操作时ParseException line 7:22 missing ) at ‘,’ near ‘’

HIve自身的bug，需要在每个子查询后面加上别名并且 group by 后面多个字段时，第一个字段不可以在表名，建议去掉括号

select a,b,c
  from (select a,b,c
          from (select a,b,c
                  from table) a
         where rno = 1) tb
 group by a;
 
SELECT tab1.a,
       b, d
       SUM(tab1.c)
FROM tab1 join tab2 on tab1.a = tab2.b
GROUP BY tab1.a,b, d

GROUPING SETS ((b, tab1.a, d))

https://blog.csdn.net/syfly007/article/details/18225327
https://blog.csdn.net/fengzheku/article/details/80599320

四、hive和presto的一些对比

处理json数据时，presto用的是json_extract_scalar，hive用的是get_json_object
二者的取交集都是INTERSECT，可以参考https://blog.csdn.net/youzi_yun/article/details/97621355

https://blog.csdn.net/u012535605/article/details/83857079
https://blog.csdn.net/whathellll/article/details/90671182
https://blog.csdn.net/circle2015/article/details/101372194

五、hive和presto的转换时间戳对比

标准时间格式——时间戳

hive:

select unix_timestamp(cast (‘2017-08-30 10:36:15’ as timestamp))

presto:

select to_unixtime(cast (‘2017-08-30 10:36:15’ as timestamp))

时间戳——标准时间格式

presto:

select format_datetime(from_unixtime(1510284058),‘yyyy-MM-dd HH:mm:ss’)

hive:

select from_unixtime(1323308943123,‘yyyy-MM-dd HH:mm:ss’)

其中会遇到将字符串的时间戳转换为数值类型，hive和presto中都提供了cast转换

cast(value AS type) type显式转换一个值的类型。可以将varchar类型的值转为数字类型，反过来转换也可以。 try_cast(value AS type) type与cast类似，不过，如果转换失败会返回null，这个只有presto有

另外需要注意的是
hive中的int类型是就是int，而presto中是包装类型Integer，如果cast的type写错也会报错

六、hive随机取数

distribute by rand()sort by rand()limit1000

https://www.jianshu.com/p/818e45384094

明日韭菜

发布了31 篇原创文章 · 获赞 41 · 访问量 13万+

私信关注

Hive开发问题汇总

文章目录