[Big Data Hive3.x Data Warehouse Development] How does HiveSQL use explode&lateral view to convert a single column to multiple rows

[Big data Hive3.x data warehouse development] HiveSQL row-to-column application

1 column to row: multi-column to multi-row –union function

[Big data Hive3.x data warehouse development] HiveSQL row-to-column application
insert image description here
1 The opposite process of converting multiple rows to multiple columnsdeduplicate and sort
union all: resultDo not deduplicate, do not sort
insert image description here

2 Column to row: single column to multiple rows – UDTF function explode

explode : Used to expand each element in a collection or array, and split each element into a row.

Pay attention to the required data type: explode(Map|Array)
so when using the explode function, either set it to the array type when creating the table, or use split to convert it into an array later:select explode(split(col3,',')) from a;

The execution effect satisfies the need to input one line and output multiple lines, so it is called UDTF function – table generation function
Features:

  • can be used directly
  • can be combinedlateral viewThe side view works with it!
    • explode(array) array produces a row for each element.
    • explode(Map) map each pair of elements as one row, one column for key, and one column for value.insert image description here
select 
	col1
	,col2
	,lv.col3 as col3
from a 
	lateral view
		explode(split(col3,',')) lv as col3; 	

UDTF function syntax restrictions – why lateral view?

Regarding the cooperation with the side view, it is necessary to talk about the grammatical restrictions of the UDTF function. The result returned by explode can be understood as avirtual table, the data comes from the source table;
therefore, it is no longer possible to return the fields of the source table and the virtual table at the same time when only the source table is queried!
The ending method is to perform join query on the two tables, and the special syntax lateral view in Hive is used to solve this requirement.
Provide another example, which should have a clearer understanding of the syntax:insert image description here

hive lateral view principle

WillUDTFThe result of building atable like dry view, and then connect each row in the original table with each row output by the UDTF function to generate a new virtual table. This avoids the usage limitation problem of UDTF.

It is also possible when using lateral viewSet field names for records generated by UDTF, the generated fields can be used in statements such as group byorder by, limit, etc., and there is no need to nest a layer of subqueries separately.insert image description here

Guess you like

Origin blog.csdn.net/weixin_43629813/article/details/130026785