1 行转列
1.1 函数
CONCAT(string A/col, string B/col…):返回输入字符串连接后的结果,支持任意个输入字符串;
CONCAT_WS(separator, str1, str2,...):它是一个特殊形式的 CONCAT()。第一个参数剩余参数间的分隔符。分隔符可以是与剩余参数一样的字符串。如果分隔符是 NULL,返回值也将为 NULL。这个函数会跳过分隔符参数后的任何 NULL 和空字符串。分隔符将被加到被连接的字符串之间;
COLLECT_SET(col):函数只接受基本数据类型,它的主要作用是将某字段的值进行去重汇总,产生array类型字段。 将某列数据转换成数组
1.1.1 concat字符串的拼接
0: jdbc:hive2://linux01:10000> desc fromatted concat ;
FAILED: SemanticException [Error 10001]: Table not found fromatted
Error: Error while compiling statement: FAILED: SemanticException [Error 10001]: Table not found fromatted (state=42S02,code=10001)
0: jdbc:hive2://linux01:10000> desc formatted concat ;
FAILED: SemanticException [Error 10001]: Table not found concat
Error: Error while compiling statement: FAILED: SemanticException [Error 10001]: Table not found concat (state=42S02,code=10001)
0: jdbc:hive2://linux01:10000> desc function concat ;
OK
+----------------------------------------------------+
| tab_name |
+----------------------------------------------------+
| concat(str1, str2, ... strN) - returns the concatenation of str1, str2, ... strN or concat(bin1, bin2, ... binN) - returns the concatenation of bytes in binary data bin1, bin2, ... binN |
+----------------------------------------------------+
示例 实现字符串的拼接
0: jdbc:hive2://linux01:10000> select concat("a" , "-->","b","-->","c")
. . . . . . . . . . . . . . .> ;
OK
+------------+
| _c0 |
+------------+
| a-->b-->c |
+------------+
concat的执行时机时行数据 将查询的表中的一行中的多个字段拼接
0: jdbc:hive2://linux01:10000> select concat(ename,":",job,":",sal) from tb_emp;
OK
+--------------------------+
| _c0 |
+--------------------------+
| SMITH:CLERK:800.0 |
| ALLEN:SALESMAN:1600.0 |
| WARD:SALESMAN:1250.0 |
| JONES:MANAGER:2975.0 |
| MARTIN:SALESMAN:1250.0 |
| BLAKE:MANAGER:2850.0 |
| CLARK:MANAGER:2450.0 |
| SCOTT:ANALYST:3000.0 |
| KING:PRESIDENT:5000.0 |
| TURNER:SALESMAN:1500.0 |
| ADAMS:CLERK:1100.0 |
| JAMES:CLERK:950.0 |
| FORD:ANALYST:3000.0 |
| MILLER:CLERK:1300.0 |
| HUGUANYU:HANGGE:18000.0 |
+--------------------------+
1.1.2 CONCAT_WS(separator, str1, str2,...)
concat_ws 比 concat 可以自定字段的分隔符
concat_ws (参数一(分隔符) , str1 , str2....)
示例
0: jdbc:hive2://linux01:10000> select concat_ws("_" , "tom","cat" ,"jim" ,"jerry") ;
OK
+--------------------+
| _c0 |
+--------------------+
| tom_cat_jim_jerry |
+--------------------+
0: jdbc:hive2://linux01:10000> select concat_ws(":" , ename ,job , sal) from tb_emp ;
FAILED: SemanticException [Error 10016]: Line 1:36 Argument type mismatch 'sal': Argument 4 of function CONCAT_WS must be "string or array<string>", but "double" was found.
Error: Error while compiling statement: FAILED: SemanticException [Error 10016]: Line 1:36 Argument type mismatch 'sal': Argument 4 of function CONCAT_WS must be "string or array<string>", but "double" was found. (state=42000,code=10016)需要类型转换 将double转换成string 语法
cast(变量 AS 数据类型) 强制类型转换
cast(sal as string)
select concat_ws(":" , ename ,job , cast(sal as string)) from tb_emp ;
OK
+--------------------------+
| _c0 |
+--------------------------+
| SMITH:CLERK:800.0 |
| ALLEN:SALESMAN:1600.0 |
| WARD:SALESMAN:1250.0 |
| JONES:MANAGER:2975.0 |
| MARTIN:SALESMAN:1250.0 |
| BLAKE:MANAGER:2850.0 |
| CLARK:MANAGER:2450.0 |
| SCOTT:ANALYST:3000.0 |
| KING:PRESIDENT:5000.0 |
| TURNER:SALESMAN:1500.0 |
| ADAMS:CLERK:1100.0 |
| JAMES:CLERK:950.0 |
| FORD:ANALYST:3000.0 |
| MILLER:CLERK:1300.0 |
| HUGUANYU:HANGGE:18000.0 |
+--------------------------+
1.1.3 COLLECT_SET(col) 将内容收集成set集合
desc function collect_set ;
OK
+----------------------------------------------------+
| tab_name |
+----------------------------------------------------+
| collect_set(x) - Returns a set of objects with duplicate elements eliminated |
+----------------------------------------------------+
对表中的某个字段列操作
select deptno from tb_emp ;
OK
+---------+
| deptno |
+---------+
| 20 |
| 30 |
| 30 |
| 20 |
| 30 |
| 30 |
| 10 |
| 20 |
| 10 |
| 30 |
| 20 |
| 30 |
| 20 |
| 10 |
| 50 |
+---------+
select conllect_set(deptno) from tb_emp ;--->去重重复元素的数组
+----------------+
| _c0 |
+----------------+
| [20,30,10,50] |
+----------------+
collect_list(col) 不会去重数据
select collect_list(deptno) as deptno_list from tb_emp ;
+-------------------------------------------------+
| deptno_list |
+-------------------------------------------------+
| [20,30,30,20,30,30,10,20,10,30,20,30,20,10,50] |
+-------------------------------------------------+
1.2 行转列
结果如下:
射手座,A 娜娜|凤姐
白羊座,A 孙悟空|猪八戒
白羊座,B 宋宋
数据
孙悟空 白羊座 A
娜娜 射手座 A
宋宋 白羊座 B
猪八戒 白羊座 A
凤姐 射手座 A