【Hive五】HQL查询

1. 查询语句组成

2. 查询语句关键字含义

2.1 LIMIT

类似于MySQL的LIMIT，用于限定查询记录数

2.2 WHERE

类似于MySQL的WHERE，用于指定查询条件

2.3 GROUP BY

分组查询

2.4 ORDER BY

全局排序
仅仅动一个reduce task
速度可能会非常慢
Strict模式下，必须与limit连用

2.5 SORT BY

可以有多个reduce task（个数如何确定？）
每个Reduce Task内部数据有序，但全局无序
通常与distribute by联合使用，用于指定数据由哪个reduce task产生

2.6 DISTRIBUTE BY

相当于MapReduce中的paritioner，默认是基于hash实现的；
与sort by连用，可发挥很好的作用

2.7 CLUSTER BY

当distribute by与sort by（降序）连用，且跟随的字段相同时，可使用cluster by简写

2.8 SORT BY、DISTRIBUTE BY、CLUSTER BY举例

3. 关联查询

3.1 Hive支持的关联查询

INNER JOIN
LEFT OUTER JOIN
RIGHT OUTER JOIN
FULL OUTER JOIN
LEFT SEMI-JOIN
Map-side Joins
仅支持等值连接，不支持不等值连接

实例：

hive> 
    > 
    > SELECT  w.id  FROM word w join my_word m on w.id = m.id;
Query ID = hadoop_20150310022828_c826a379-81d7-4d8b-a299-3f163ee4079a
Total jobs = 1
15/03/10 02:28:37 WARN conf.Configuration: file:/home/hadoop/software/apache-hive-0.14.0-bin/iotmp/9a4f11ed-42a4-44cc-a405-2bcd87bce0b7/hive_2015-03-10_02-28-14_555_4164322138343464793-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
15/03/10 02:28:37 WARN conf.Configuration: file:/home/hadoop/software/apache-hive-0.14.0-bin/iotmp/9a4f11ed-42a4-44cc-a405-2bcd87bce0b7/hive_2015-03-10_02-28-14_555_4164322138343464793-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/software/hadoop-2.5.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/software/apache-hive-0.14.0-bin/lib/hive-jdbc-0.14.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Execution log at: /tmp/hadoop/hadoop_20150310022828_c826a379-81d7-4d8b-a299-3f163ee4079a.log
2015-03-10 02:28:41	Starting to launch local task to process map join;	maximum memory = 477102080
2015-03-10 02:28:49	Dump the side-table for tag: 1 with group count: 3 into file: file:/home/hadoop/software/apache-hive-0.14.0-bin/iotmp/9a4f11ed-42a4-44cc-a405-2bcd87bce0b7/hive_2015-03-10_02-28-14_555_4164322138343464793-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile11--.hashtable
2015-03-10 02:28:49	Uploaded 1 File to: file:/home/hadoop/software/apache-hive-0.14.0-bin/iotmp/9a4f11ed-42a4-44cc-a405-2bcd87bce0b7/hive_2015-03-10_02-28-14_555_4164322138343464793-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile11--.hashtable (320 bytes)
2015-03-10 02:28:49	End of local task; Time Taken: 7.816 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1425868733189_0004, Tracking URL = http://hadoop.master:8088/proxy/application_1425868733189_0004/
Kill Command = /home/hadoop/software/hadoop-2.5.2/bin/hadoop job  -kill job_1425868733189_0004
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2015-03-10 02:29:17,976 Stage-3 map = 0%,  reduce = 0%
2015-03-10 02:29:32,438 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 3.33 sec
MapReduce Total cumulative CPU time: 3 seconds 330 msec
Ended Job = job_1425868733189_0004
MapReduce Jobs Launched: 
Stage-Stage-3: Map: 1   Cumulative CPU: 3.33 sec   HDFS Read: 254 HDFS Write: 13 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 330 msec
OK
1
10
10
1000
Time taken: 80.261 seconds, Fetched: 4 row(s)

3.2 Map side Join

Join操作在map task中完成，因此无需启动reduce task；
适合一个大表，一个小表的连接操作
思想：小表复制到各个节点上，并加载到内存中；大表分片，与小表完成连接操作

3.3 Reduce side Join

适合两个大表连接操作
思想：map端按照连接字段进行hash，reduce 端完成连接操作

举例：

SELECT /*+ MAPJOIN(b) */ a.key, a.value FROM a join b on a.key = b.key

3.4 LEFT SEMI-JOIN(左半连接)

select word.id from word left semi join my_word on (word.id=my_word.id);

实例：

hive> select word.id from word left semi join my_word on (word.id=my_word.id); 
Query ID = hadoop_20150310020606_41b5d13c-a83e-4878-823c-d9911d0c274b
Total jobs = 1
15/03/10 02:08:54 WARN conf.Configuration: file:/home/hadoop/software/apache-hive-0.14.0-bin/iotmp/9a4f11ed-42a4-44cc-a405-2bcd87bce0b7/hive_2015-03-10_02-06-52_379_8334166551786931789-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
15/03/10 02:08:54 WARN conf.Configuration: file:/home/hadoop/software/apache-hive-0.14.0-bin/iotmp/9a4f11ed-42a4-44cc-a405-2bcd87bce0b7/hive_2015-03-10_02-06-52_379_8334166551786931789-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/software/hadoop-2.5.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/software/apache-hive-0.14.0-bin/lib/hive-jdbc-0.14.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Execution log at: /tmp/hadoop/hadoop_20150310020606_41b5d13c-a83e-4878-823c-d9911d0c274b.log
2015-03-10 02:09:34	Starting to launch local task to process map join;	maximum memory = 477102080
2015-03-10 02:09:42	Dump the side-table for tag: 1 with group count: 3 into file: file:/home/hadoop/software/apache-hive-0.14.0-bin/iotmp/9a4f11ed-42a4-44cc-a405-2bcd87bce0b7/hive_2015-03-10_02-06-52_379_8334166551786931789-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile01--.hashtable
2015-03-10 02:09:43	Uploaded 1 File to: file:/home/hadoop/software/apache-hive-0.14.0-bin/iotmp/9a4f11ed-42a4-44cc-a405-2bcd87bce0b7/hive_2015-03-10_02-06-52_379_8334166551786931789-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile01--.hashtable (316 bytes)
2015-03-10 02:09:43	End of local task; Time Taken: 8.098 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1425868733189_0003, Tracking URL = http://hadoop.master:8088/proxy/application_1425868733189_0003/
Kill Command = /home/hadoop/software/hadoop-2.5.2/bin/hadoop job  -kill job_1425868733189_0003
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2015-03-10 02:12:42,201 Stage-3 map = 0%,  reduce = 0%
2015-03-10 02:13:42,866 Stage-3 map = 0%,  reduce = 0%
2015-03-10 02:14:17,089 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 13.16 sec
MapReduce Total cumulative CPU time: 13 seconds 160 msec
Ended Job = job_1425868733189_0003
MapReduce Jobs Launched: 
Stage-Stage-3: Map: 1   Cumulative CPU: 13.16 sec   HDFS Read: 254 HDFS Write: 10 SUCCESS
Total MapReduce CPU Time Spent: 13 seconds 160 msec
OK
1
10
1000
Time taken: 451.347 seconds, Fetched: 3 row(s)

1. 查询语句组成

2. 查询语句关键字含义

2.1 LIMIT

2.3 GROUP BY

2.4 ORDER BY

2.5 SORT BY

2.6 DISTRIBUTE BY

2.7 CLUSTER BY

2.8 SORT BY、DISTRIBUTE BY、CLUSTER BY举例

3. 关联查询

3.1 Hive支持的关联查询

3.2 Map side Join

3.3 Reduce side Join

3.4 LEFT SEMI-JOIN(左半连接)

猜你喜欢