Article Directory
[Big data Hive3.x data warehouse development] HiveSQL row-to-column application
1 column to row: multi-column to multi-row –union function
[Big data Hive3.x data warehouse development] HiveSQL row-to-column application
1 The opposite process of converting multiple rows to multiple columnsdeduplicate and sort
union all: resultDo not deduplicate, do not sort
2 Column to row: single column to multiple rows – UDTF function explode
explode : Used to expand each element in a collection or array, and split each element into a row.
Pay attention to the required data type: explode(Map|Array)
so when using the explode function, either set it to the array type when creating the table, or use split to convert it into an array later:select explode(split(col3,',')) from a;
The execution effect satisfies the need to input one line and output multiple lines, so it is called UDTF function – table generation function
Features:
- can be used directly
- can be combinedlateral viewThe side view works with it!
- explode(array) array produces a row for each element.
- explode(Map) map each pair of elements as one row, one column for key, and one column for value.
select
col1
,col2
,lv.col3 as col3
from a
lateral view
explode(split(col3,',')) lv as col3;
UDTF function syntax restrictions – why lateral view?
Regarding the cooperation with the side view, it is necessary to talk about the grammatical restrictions of the UDTF function. The result returned by explode can be understood as avirtual table, the data comes from the source table;
therefore, it is no longer possible to return the fields of the source table and the virtual table at the same time when only the source table is queried!
The ending method is to perform join query on the two tables, and the special syntax lateral view in Hive is used to solve this requirement.
Provide another example, which should have a clearer understanding of the syntax:
hive lateral view principle
WillUDTFThe result of building atable like dry view, and then connect each row in the original table with each row output by the UDTF function to generate a new virtual table. This avoids the usage limitation problem of UDTF.
It is also possible when using lateral viewSet field names for records generated by UDTF, the generated fields can be used in statements such as group byorder by, limit, etc., and there is no need to nest a layer of subqueries separately.