The use and explanation of order by
1. Use of order by
Everyone knows that order by is used for sorting in hive. The syntax is as follows
SELECT * FROM tab_name ORDER BY column_name;
When using order by, the default is to sort in ascending order (ASC), the string type is sorted in lexicographic order, and the numeric type is sorted according to the size of the value. The specific examples are as follows:
Data in the table:
goods | gtype | price |
---|---|---|
chewing gum | food | 10 |
Potato chips | food | 20 |
chocolate | food | 30 |
cake | food | 40 |
French fries | food | 50 |
Biscuits | food | 60 |
bread | food | 70 |
moon cake | food | 80 |
Instant noodles | food | 90 |
washing powder | Daily necessities | 11 |
Laundry detergent | Daily necessities | 21 |
Dish soap | Daily necessities | 31 |
Mop | Daily necessities | 41 |
towel | Daily necessities | 51 |
toilet paper | Daily necessities | 61 |
Water cup | Daily necessities | 71 |
kettle | Daily necessities | 81 |
chair | Daily necessities | 91 |
TV | Electrical appliances | 12 |
computer | Electrical appliances | 22 |
refrigerator | Electrical appliances | 32 |
Freezer | Electrical appliances | 42 |
washing machine | Electrical appliances | 52 |
the little sun | Electrical appliances | 62 |
Electric heater | Electrical appliances | 72 |
Induction cooker | Electrical appliances | 82 |
air conditioning | Electrical appliances | 92 |
Water heater | Electrical appliances | 102 |
The field price in the table is of type int, and the rest are of type string.
Use the following sql to query
select * from t_test order by price desc;
search result:
goods | gtype | price |
---|---|---|
Water heater | Electrical appliances | 102 |
air conditioning | Electrical appliances | 92 |
chair | Daily necessities | 91 |
Instant noodles | food | 90 |
Induction cooker | Electrical appliances | 82 |
kettle | Daily necessities | 81 |
moon cake | food | 80 |
Electric heater | Electrical appliances | 72 |
Water cup | Daily necessities | 71 |
bread | food | 70 |
the little sun | Electrical appliances | 62 |
toilet paper | Daily necessities | 61 |
Biscuits | food | 60 |
washing machine | Electrical appliances | 52 |
towel | Daily necessities | 51 |
French fries | food | 50 |
Freezer | Electrical appliances | 42 |
Mop | Daily necessities | 41 |
cake | food | 40 |
refrigerator | Electrical appliances | 32 |
Dish soap | Daily necessities | 31 |
chocolate | food | 30 |
computer | Electrical appliances | 22 |
Laundry detergent | Daily necessities | 21 |
Potato chips | food | 20 |
TV | Electrical appliances | 12 |
washing powder | Daily necessities | 11 |
chewing gum | food | 10 |
Here I use the reverse order (desc) for sorting. You can see that the data in the table is sorted in reverse order by the field price. Usually when we use order by, the field is sorted, but sometimes it is also used. Multi-field sorting. When using multi-field sorting, we need to indicate the sorting method after each field after order by. The specific syntax is as follows
SELECT * FROM tab_name ORDER BY column_1 desc,column_2,desc,column_3....
When you need to sort in reverse order, add desc after the field. If you don’t add it, the default is asc sort. The sorting rule for multiple fields is to sort the first field in normal or reverse order. After sorting, the first When the values of the two fields are the same, the second field is sorted in positive or reverse order. It can be understood that the first field value is used for grouping and then the second field is sorted. The query sql is as follows
select
goods,
case gtype
when '食品'
then concat("7_",gtype)
when '日用品'
then concat("3_",gtype)
when '电器'
then concat("5_",gtype)
end gtype
,
price
from t_test
order by gtype,price desc;
The results are as follows:
goods | gtype | price |
---|---|---|
chair | 3_Commodity | 91 |
kettle | 3_Commodity | 81 |
Water cup | 3_Commodity | 71 |
toilet paper | 3_Commodity | 61 |
towel | 3_Commodity | 51 |
Mop | 3_Commodity | 41 |
Dish soap | 3_Commodity | 31 |
Laundry detergent | 3_Commodity | 21 |
washing powder | 3_Commodity | 11 |
Water heater | 5_Electrical | 102 |
air conditioning | 5_Electrical | 92 |
Induction cooker | 5_Electrical | 82 |
Electric heater | 5_Electrical | 72 |
the little sun | 5_Electrical | 62 |
washing machine | 5_Electrical | 52 |
Freezer | 5_Electrical | 42 |
refrigerator | 5_Electrical | 32 |
computer | 5_Electrical | 22 |
TV | 5_Electrical | 12 |
Instant noodles | 7_food | 90 |
moon cake | 7_food | 80 |
bread | 7_food | 70 |
Biscuits | 7_food | 60 |
French fries | 7_food | 50 |
cake | 7_food | 40 |
chocolate | 7_food | 30 |
Potato chips | 7_food | 20 |
chewing gum | 7_food | 10 |
Because the gtype data values are all Chinese characters, the sorting effect is not easy to see, so they are all prefixed with numbers. From the results, it can be seen that gtype is sorted in the positive order of the dictionary order, and the price field is sorted in reverse order according to the numeric type. Yes, through this example, you can see the sorting rules of order by very straightforwardly.
2. The execution mechanism of order by in hive
order by是进行全局的排序,所以最终数据都会集中在一个reduce中,因为如果分散在多个reduce中就无法保证是全局排序,并且在hive使用order by的时候会受到如下属性的约束:
set hive.mapred.mode=nonstrict;
set hive.mapred.mode=strict;
默认是在nonstrict模式下的,如果在strict模式下使用order by的话必须使用limit关键字,因为如果数据量过大的话,执行的时间会非常长。
通过如下列子看一下:
首先设置位strict模式
set hive.mapred.mode=strict;
然后执行如下sql
select * from t_test order by price;
结果如下
FAILED: SemanticException 1:30 Order by-s without limit are disabled for safety reasons. If you know what you are doing,
please sethive.strict.checks.large.query to false and that hive.mapred.mode is not set to 'strict' to proceed.
Note that if you may get errors or incorrect results if you make a mistake while using some of the unsafe features..
Error encountered near token 'price'
Error: Error while compiling statement: FAILED: SemanticException 1:30 Order by-s without limit are disabled for safety reasons.
If you know what you are doing, please sethive.strict.checks.large.query to false and that hive.mapred.mode is not set to 'strict' to proceed.
Note that if you may get errors or incorrect results if you make a mistake while using some of the unsafe features.. Error encountered near token 'price' (state=42000,code=40000)
第一行就出现了“Order by-s without limit”,可以看到设置为strict模式后,不加limit是无法执行order by语句的
加上limit关键字,sql如下
select * from t_test order by price limit 7;
结果如下:
goods | gtype | price |
---|---|---|
口香糖 | 食品 | 10 |
洗衣粉 | 日用品 | 11 |
电视 | 电器 | 12 |
薯片 | 食品 | 20 |
洗衣液 | 日用品 | 21 |
电脑 | 电器 | 22 |
巧克力 | 食品 | 30 |