Detailed explanation of order by in hive

The use and explanation of order by

1. Use of order by

Everyone knows that order by is used for sorting in hive. The syntax is as follows

SELECT * FROM tab_name ORDER BY column_name;

When using order by, the default is to sort in ascending order (ASC), the string type is sorted in lexicographic order, and the numeric type is sorted according to the size of the value. The specific examples are as follows:
Data in the table:

goods gtype price
chewing gum food 10
Potato chips food 20
chocolate food 30
cake food 40
French fries food 50
Biscuits food 60
bread food 70
moon cake food 80
Instant noodles food 90
washing powder Daily necessities 11
Laundry detergent Daily necessities 21
Dish soap Daily necessities 31
Mop Daily necessities 41
towel Daily necessities 51
toilet paper Daily necessities 61
Water cup Daily necessities 71
kettle Daily necessities 81
chair Daily necessities 91
TV Electrical appliances 12
computer Electrical appliances 22
refrigerator Electrical appliances 32
Freezer Electrical appliances 42
washing machine Electrical appliances 52
the little sun Electrical appliances 62
Electric heater Electrical appliances 72
Induction cooker Electrical appliances 82
air conditioning Electrical appliances 92
Water heater Electrical appliances 102

The field price in the table is of type int, and the rest are of type string.
Use the following sql to query

select * from t_test order by price desc;

search result:

goods gtype price
Water heater Electrical appliances 102
air conditioning Electrical appliances 92
chair Daily necessities 91
Instant noodles food 90
Induction cooker Electrical appliances 82
kettle Daily necessities 81
moon cake food 80
Electric heater Electrical appliances 72
Water cup Daily necessities 71
bread food 70
the little sun Electrical appliances 62
toilet paper Daily necessities 61
Biscuits food 60
washing machine Electrical appliances 52
towel Daily necessities 51
French fries food 50
Freezer Electrical appliances 42
Mop Daily necessities 41
cake food 40
refrigerator Electrical appliances 32
Dish soap Daily necessities 31
chocolate food 30
computer Electrical appliances 22
Laundry detergent Daily necessities 21
Potato chips food 20
TV Electrical appliances 12
washing powder Daily necessities 11
chewing gum food 10

Here I use the reverse order (desc) for sorting. You can see that the data in the table is sorted in reverse order by the field price. Usually when we use order by, the field is sorted, but sometimes it is also used. Multi-field sorting. When using multi-field sorting, we need to indicate the sorting method after each field after order by. The specific syntax is as follows

SELECT * FROM tab_name ORDER BY column_1 desc,column_2,desc,column_3....

When you need to sort in reverse order, add desc after the field. If you don’t add it, the default is asc sort. The sorting rule for multiple fields is to sort the first field in normal or reverse order. After sorting, the first When the values ​​of the two fields are the same, the second field is sorted in positive or reverse order. It can be understood that the first field value is used for grouping and then the second field is sorted. The query sql is as follows

select
 goods,
 case gtype
 when '食品'
 then concat("7_",gtype)
 when '日用品'
 then concat("3_",gtype)
 when '电器'
 then concat("5_",gtype)
 end gtype
 ,
 price
from t_test
order by gtype,price desc;

The results are as follows:

goods gtype price
chair 3_Commodity 91
kettle 3_Commodity 81
Water cup 3_Commodity 71
toilet paper 3_Commodity 61
towel 3_Commodity 51
Mop 3_Commodity 41
Dish soap 3_Commodity 31
Laundry detergent 3_Commodity 21
washing powder 3_Commodity 11
Water heater 5_Electrical 102
air conditioning 5_Electrical 92
Induction cooker 5_Electrical 82
Electric heater 5_Electrical 72
the little sun 5_Electrical 62
washing machine 5_Electrical 52
Freezer 5_Electrical 42
refrigerator 5_Electrical 32
computer 5_Electrical 22
TV 5_Electrical 12
Instant noodles 7_food 90
moon cake 7_food 80
bread 7_food 70
Biscuits 7_food 60
French fries 7_food 50
cake 7_food 40
chocolate 7_food 30
Potato chips 7_food 20
chewing gum 7_food 10

Because the gtype data values ​​are all Chinese characters, the sorting effect is not easy to see, so they are all prefixed with numbers. From the results, it can be seen that gtype is sorted in the positive order of the dictionary order, and the price field is sorted in reverse order according to the numeric type. Yes, through this example, you can see the sorting rules of order by very straightforwardly.

2. The execution mechanism of order by in hive

order by是进行全局的排序,所以最终数据都会集中在一个reduce中,因为如果分散在多个reduce中就无法保证是全局排序,并且在hive使用order by的时候会受到如下属性的约束:

set hive.mapred.mode=nonstrict;
set hive.mapred.mode=strict;

默认是在nonstrict模式下的,如果在strict模式下使用order by的话必须使用limit关键字,因为如果数据量过大的话,执行的时间会非常长。
通过如下列子看一下:
首先设置位strict模式

set hive.mapred.mode=strict;

然后执行如下sql

 select * from t_test order by price;

结果如下

FAILED: SemanticException 1:30 Order by-s without limit are disabled for safety reasons. If you know what you are doing, 
please sethive.strict.checks.large.query to false and that hive.mapred.mode is not set to 'strict' to proceed. 
Note that if you may get errors or incorrect results if you make a mistake while using some of the unsafe features.. 
Error encountered near token 'price'
Error: Error while compiling statement: FAILED: SemanticException 1:30 Order by-s without limit are disabled for safety reasons. 
If you know what you are doing, please sethive.strict.checks.large.query to false and that hive.mapred.mode is not set to 'strict' to proceed. 
Note that if you may get errors or incorrect results if you make a mistake while using some of the unsafe features.. Error encountered near token 'price' (state=42000,code=40000)

第一行就出现了“Order by-s without limit”,可以看到设置为strict模式后,不加limit是无法执行order by语句的
加上limit关键字,sql如下

select * from t_test order by price limit 7;

结果如下:

goods gtype price
口香糖 食品 10
洗衣粉 日用品 11
电视 电器 12
薯片 食品 20
洗衣液 日用品 21
电脑 电器 22
巧克力 食品 30

Guess you like

Origin blog.csdn.net/AnameJL/article/details/112413521