每天一点点,记录工作中实操可行
hive json数组解析
hive中有字段A长这个样子,想把其中的name值全部解析出来
[{"itemRateId":"73288842","name":"东北有机大米饭","rating":4,"ratingContent":""},{"itemRateId":"73288850","name":"需要餐具","rating":4,"ratingContent":""},{"itemRateId":"73288834","name":"辣跳冒牛百叶","rating":4,"ratingContent":""},{"itemRateId":"73288826","name":"金汤酸菜鱼套餐(鱼+配菜+小菜+米饭)-现炸酥肉(+10元)","rating":4,"ratingContent":""}]
1: get_json_object 每次只能解析一个字段,但我不知道每个字段A到底有多少个name的键值对,不可能挨个解析,所以。。。。。
select get_json_object(被解析字段A,"$.[0].name") column_name
from table_name
结果,只把第一个name值解析出来了
劲爽金汤酸菜鱼小份(米饭另点)-大份
2:先通过explode把原数据变成n行数据,然后再用get_json_object去解析
select ss.col
from
(select
split(regexp_replace(regexp_extract(被解析字段A,'^\\[(.+)\\]$',1),'\\}\\,\\{', '\\}\\|\\|\\{'),'\\|\\|') as str
from table_name ) pp
lateral view explode(pp.str) ss as col
输出结果
{"itemRateId":"81980274","name":"劲爽金汤酸菜鱼小份(米饭另点)-大份","rating":2,"ratingContent":""}
{"itemRateId":"81980290","name":"需要餐具","rating":2,"ratingContent":""}
{"itemRateId":"81980266","name":"崇明有机大米饭","rating":2,"ratingContent":""}
{"itemRateId":"81980282","name":"下单前+..领大额红包","rating":2,"ratingContent":""}
{"itemRateId":"81980258","name":"肉麻青花椒鱼小份(米饭另点)-大份","rating":2,"ratingContent":""}
3:用json_tuple 分别解析每行数据
select rr.itemRateId,rr.name,rr.rating
from (
select
split(regexp_replace(regexp_extract(被解析字段A,'^\\[(.+)\\]$',1),'\\}\\,\\{', '\\}\\|\\|\\{'),'\\|\\|') as str
from table_name ) pp
lateral view explode(pp.str) ss as col
lateral view json_tuple(ss.col,'itemRateId','name','rating') rr as itemRateId,name,rating
输出结果,已经将该字段解析成了5行
itemRateId name rating
81980274 劲爽金汤酸菜鱼小份(米饭另点)-大份 2
81980290 需要餐具 2
81980266 崇明有机大米饭 2
81980282 下单前+..领大额红包 2
81980258 肉麻青花椒鱼小份(米饭另点)-大份 2
4:如果把原来一行的数据,解析成多行,数据量就会变得很大,这个时候,可以再把解析出来的多行字段,合并成一个字段
select 字段1,字段2,
concat_ws(',', collect_set(rr.name)) as z_name
from table_name
group by 字段1,字段2
输出结果:
字段1,字段2,劲爽金汤酸菜鱼小份(米饭另点)-大份,需要餐具,崇明有机大米饭,下单前+…领大额红包,肉麻青花椒鱼小份(米饭另点)
这样,就把原来多数组的json字段,解析,且合并成一个字段啦
有点麻烦,但可以解决部分工作