hive json数组解析explode,json_tuple

每天一点点,记录工作中实操可行
hive json数组解析

hive中有字段A长这个样子,想把其中的name值全部解析出来

[{"itemRateId":"73288842","name":"东北有机大米饭","rating":4,"ratingContent":""},{"itemRateId":"73288850","name":"需要餐具","rating":4,"ratingContent":""},{"itemRateId":"73288834","name":"辣跳冒牛百叶","rating":4,"ratingContent":""},{"itemRateId":"73288826","name":"金汤酸菜鱼套餐(鱼+配菜+小菜+米饭)-现炸酥肉(+10元)","rating":4,"ratingContent":""}]

1: get_json_object 每次只能解析一个字段,但我不知道每个字段A到底有多少个name的键值对,不可能挨个解析,所以。。。。。

select  get_json_object(被解析字段A,"$.[0].name")  column_name 
from table_name

结果,只把第一个name值解析出来了

劲爽金汤酸菜鱼小份(米饭另点)-大份

2:先通过explode把原数据变成n行数据,然后再用get_json_object去解析

select  ss.col
from 
(select 
split(regexp_replace(regexp_extract(被解析字段A,'^\\[(.+)\\]$',1),'\\}\\,\\{', '\\}\\|\\|\\{'),'\\|\\|') as str
from table_name  ) pp
lateral view explode(pp.str) ss as col 

输出结果

{"itemRateId":"81980274","name":"劲爽金汤酸菜鱼小份(米饭另点)-大份","rating":2,"ratingContent":""}
{"itemRateId":"81980290","name":"需要餐具","rating":2,"ratingContent":""}
{"itemRateId":"81980266","name":"崇明有机大米饭","rating":2,"ratingContent":""}
{"itemRateId":"81980282","name":"下单前+..领大额红包","rating":2,"ratingContent":""}
{"itemRateId":"81980258","name":"肉麻青花椒鱼小份(米饭另点)-大份","rating":2,"ratingContent":""}

3:用json_tuple 分别解析每行数据

select  rr.itemRateId,rr.name,rr.rating
from (
select 
split(regexp_replace(regexp_extract(被解析字段A,'^\\[(.+)\\]$',1),'\\}\\,\\{', '\\}\\|\\|\\{'),'\\|\\|') as str
from table_name ) pp
lateral view explode(pp.str) ss as col 
lateral view json_tuple(ss.col,'itemRateId','name','rating') rr as itemRateId,name,rating

输出结果,已经将该字段解析成了5行

itemRateId          name                                                rating
81980274	劲爽金汤酸菜鱼小份(米饭另点)-大份	2
81980290	需要餐具	2
81980266	崇明有机大米饭	2
81980282	下单前+..领大额红包	2
81980258	肉麻青花椒鱼小份(米饭另点)-大份	2

4:如果把原来一行的数据,解析成多行,数据量就会变得很大,这个时候,可以再把解析出来的多行字段,合并成一个字段

select 字段1,字段2,
concat_ws(',', collect_set(rr.name)) as z_name
from table_name
group by 字段1,字段2

输出结果:
字段1,字段2,劲爽金汤酸菜鱼小份(米饭另点)-大份,需要餐具,崇明有机大米饭,下单前+…领大额红包,肉麻青花椒鱼小份(米饭另点)

这样,就把原来多数组的json字段,解析,且合并成一个字段啦
有点麻烦,但可以解决部分工作

发布了124 篇原创文章 · 获赞 23 · 访问量 2万+

猜你喜欢

转载自blog.csdn.net/YmeBtc/article/details/101710477