Hive 行转列&列转行

行转列

  • concat_ws:
hive (wzj)> desc function extended concat_ws;
OK
tab_name
concat_ws(separator, [string | array(string)]+) - returns the concatenation of the strings separated by the separator.
Example:
  > SELECT concat_ws('.', 'www', array('facebook', 'com')) FROM src LIMIT 1;
  'www.facebook.com'
Time taken: 0.029 seconds, Fetched: 4 row(s)
  • concat:
hive (wzj)> desc function extended concat;
OK
tab_name
concat(str1, str2, ... strN) - returns the concatenation of str1, str2, ... strN or concat(bin1, bin2, ... binN) - returns the concatenation of bytes in binary data  bin1, bin2, ... binN
Returns NULL if any argument is NULL.
Example:
  > SELECT concat('abc', 'def') FROM src LIMIT 1;
  'abcdef'
Time taken: 0.021 seconds, Fetched: 5 row(s)
  • collect_set:
hive (wzj)> desc function extended collect_set;
OK
tab_name
collect_set(x) - Returns a set of objects with duplicate elements eliminated
Time taken: 0.042 seconds, Fetched: 1 row(s)
  • 需求
jack,dept01,A
jerry,dept02,A
wzj,dept01,A
qwe,dept02,A
asd,dept03,A

==>

dept01 A jack|wzj
dept02 A jerry|qwe
dept03 A asd
  • 实现
hive (wzj)> create table hzl_concat_ws(name string,dept string,grade string)row format delimited fields terminated by ',';
OK
Time taken: 0.166 seconds
hive (wzj)> load data local inpath '/home/wzj/data/dept.txt' into table hzl_concat_ws;
Loading data to table wzj.hzl_concat_ws
Table wzj.hzl_concat_ws stats: [numFiles=1, numRows=0, totalSize=66, rawDataSize=0]
OK
Time taken: 0.621 seconds
hive (wzj)> select * from hzl_concat_ws;
OK
hzl_concat_ws.name	hzl_concat_ws.dept	hzl_concat_ws.grade
ck	dept01	A
jerry	dept02	A
wzj	dept01	A
qwe	dept02	A
asd	dept03	A
Time taken: 0.09 seconds, Fetched: 5 row(s)

hive (wzj)> select t.grade_concat,concat_ws('|',t.name) from(
          > select name,concat(dept,',',grade) as grade_concat from hzl_concat_ws) t
          > group by t.grade_concat;
FAILED: SemanticException [Error 10002]: Line 1:38 Invalid column reference 'name'

hive (wzj)> select t.grade_concat,concat_ws('|',collect_set(t.name)) from(
          > select name,concat(dept,',',grade) as grade_concat from hzl_concat_ws) t
          > group by t.grade_concat
          > ;
Ended Job = job_1585756506667_0001
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 5.49 sec   HDFS Read: 8655 HDFS Write: 48 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 490 msec
OK
t.grade_concat	_c1
dept01,A	ck|wzj
dept02,A	jerry|qwe
dept03,A	asd
Time taken: 39.233 seconds, Fetched: 3 row(s)

列转行

  • explode:
hive (wzj)> desc function extended explode;
OK
tab_name
explode(a) - separates the elements of array a into multiple rows, or the elements of a map into multiple rows and columns 
Time taken: 0.019 seconds, Fetched: 1 row(s)

将数组a的元素分为多行,或将地图的元素分为多行和多列
  • 需求:
jack 语文,数学,英语
jerry 语文,数学
wzj 英语,物理

==>

jack 语文
jack 数学
jack 英语
...
hive (wzj)> create table lzh_array(name string,loaction array<string>) row format delimited fields terminated by '\t' collection items terminated by ','; 
OK
Time taken: 0.086 seconds

hive (wzj)> load data local inpath '/home/wzj/data/lzh_arrat.txt' into table lzh_array;
Loading data to table wzj.lzh_array
Table wzj.lzh_array stats: [numFiles=1, numRows=0, totalSize=64, rawDataSize=0]
OK
Time taken: 0.354 seconds
hive (wzj)> select * from lzh_array;
OK
lzh_array.name	lzh_array.loaction
jack	["语文","数学","英语"]
jerry	["语文","数学"]
wzj	["英语","物理"]
Time taken: 0.111 seconds, Fetched: 3 row(s)
hive (wzj)> select explode(loaction) from lzh_array;
OK
col
语文
数学
英语
语文
数学
英语
物理
Time taken: 0.104 seconds, Fetched: 7 row(s)
hive (wzj)> select name,subject from lzh_array lateral view explode(loaction) tmp as subject;
OK
name	subject
jack	语文
jack	数学
jack	英语
jerry	语文
jerry	数学
wzj	英语
wzj	物理
Time taken: 0.084 seconds, Fetched: 7 row(s)

发布了45 篇原创文章 · 获赞 1 · 访问量 1764

猜你喜欢

转载自blog.csdn.net/wzj_wp/article/details/105533538