hive Advanced Skills

1. the date format (converted to the yyyymmdd yyyy-mm-dd)

select from_unixtime(unix_timestamp('20180905','yyyymmdd'),'yyyy-mm-dd')

2..hive field remove alphanumeric characters other outside

select regexp_replace(a, '[^0-9a-zA-Z]', '') from tbl_name

3.hive json field parsing
the content field stored json { "score": "100 ", "name": "zhou", "class": '' math "}, for json to parse the following manner

---解析单个字段
select get_json_object(content,'$.score') ,
           get_json_object(content,'$.name),
           get_json_object(content,'$.class')
 from tbl_name
---解析多个字段可以用json_tuple
select a.*
      ,b.score
      ,b.name
      ,b.class from tbl a lateral view outer json_tuple(a.content,'score', 'name', 'class') b as score,name,class

4.hive import data
when uploaded from a local file system, need to add local keyword; hdfs if uploaded directly from the path, not add local

load data [local] inpath '/data/monthcard.csv' overwrite into table tbl_name;

5.hive avoid scientific notation

select printf("%.2f",3.428777027500007E7)

6.hive collect_set and lateral view explode usage
of raw data

id1    id2    name
1       1       A
1       1       B
1       1       C
1       2       X
1       2       Y

(1)collect_set

select id1,id2,
collect_set(name) as new_name1,
collect_set(case when id2>1 then name end) as new_name2,
count(name) as cnt from default.zql_test group by id1,id2; ---输出结果 OK id1 id2 new_name1 new_name2 cnt 1 1 ["C","A","B"] [] 3 1 2 ["X","Y"] ["X","Y"] 2 

(2)lateral view explode

select * 
from 
(
select id1,id2,
collect_set(name) as new_name1,
collect_set(case when id2>1 then name end) as new_name2, count(name) as cnt from default. zql_test group by id1,id2 )t lateral view explode(new_name1) t as new_type1 lateral view explode(new_name2) t as new_type2 ----输出结果 OK t.id1 t.id2 t.new_name1 t.new_name2 t.cnt t.new_type1 t.new_type2 1 2 ["Y","X"] ["Y","X"] 2 Y Y 1 2 ["Y","X"] ["Y","X"] 2 Y X 1 2 ["Y","X"] ["Y","X"] 2 X Y 1 2 ["Y","X"] ["Y","X"] 2 X X 

Before (3) lateral view explode outer, coupled with outer will retain all records, the difference can refer to the topic

select * 
from 
(
select id1,id2,
collect_set(name) as new_name1,
collect_set(case when id2>1 then name end) as new_name2,
count(name) as cnt from default. zql_test group by id1,id2 )t lateral view outer explode(new_name1) t as new_type1 lateral view outer explode(new_name2) t as new_type2 ; ----输出结果 OK t.id1 t.id2 t.new_name1 t.new_name2 t.cnt t.new_type1 t.new_type2 1 1 ["B","A","C"] [] 3 B NULL 1 1 ["B","A","C"] [] 3 A NULL 1 1 ["B","A","C"] [] 3 C NULL 1 2 ["X","Y"] ["X","Y"] 2 X X 1 2 ["X","Y"] ["X","Y"] 2 X Y 1 2 ["X","Y"] ["X","Y"] 2 Y X 1 2 ["X","Y"] ["X","Y"] 2 Y Y 

7.hive take the top few percent

---分组内将数据分成两片
ntile(2)over(partition by id order by create_tm)

8.hive method returns the day of the week

---2012-01-01刚好星期日
select pmod(datediff(from_unixtime(unix_timestamp()),'2012-01-01'),7) from default.dual; --返回值0-6 --其中0代表星期日

9.hive generate uuid

select regexp_replace(reflect("java.util.UUID", "randomUUID"), "-", "");

10.hive match in Chinese

select  regexp '[\\u4e00-\\u9fa5]';

11.hive in REGEXP_EXTRACT usage
regexp_extract (string subject, string regex_pattern, string index)
Description: string of a regular expression regex_pattern index extracting portions that match the string subject

First Parameters: field to process
the second parameter: the need to match the regular expression
third parameter:
0 is a match with the entire string
1 is inside the first bracket
2 is the second brackets fields ...

举例:
--取一个连续17位为数字的字符串,且两端为非数字

select regexp_extract('1、非订单号(20位):00123456789876543210;
                      2、订单号(17位):12345678987654321;
                      3、其它文字','[^\\d](\\d{17})[^\\d]',0) as s1 , substr(regexp_extract('1、非订单号(20位):01234567898765432100; 2、订单号(17位):12345678987654321; 3、其它文字','[^\\d](\\d{17})[^\\d]',0),2,17) as s2 ,regexp_extract('1、非订单号(20位):00123456789876543210; 2、订单号(17位):12345678987654321; 3、其它文字','[^\\d](\\d{17})[^\\d]',1) as s3; 



Link: https: //www.jianshu.com/p/fe1cdd06f5f8

 

Guess you like

Origin www.cnblogs.com/Allen-rg/p/10986311.html