【Hive五】HQL查询

1. 查询语句组成



 

2. 查询语句关键字含义

2.1 LIMIT

类似于MySQL的LIMIT,用于限定查询记录数

2.2 WHERE

类似于MySQL的WHERE,用于指定查询条件

2.3 GROUP BY

分组查询

2.4 ORDER BY

  • 全局排序
  • 仅仅动一个reduce task
  • 速度可能会非常慢
  • Strict模式下,必须与limit连用

2.5 SORT BY

  • 可以有多个reduce task(个数如何确定?)
  • 每个Reduce Task内部数据有序,但全局无序
  • 通常与distribute by联合使用,用于指定数据由哪个reduce task产生

2.6 DISTRIBUTE BY

  • 相当于MapReduce中的paritioner,默认是基于hash实现的;
  • 与sort by连用,可发挥很好的作用

2.7 CLUSTER BY

  • 当distribute by与sort by(降序)连用,且跟随的字段相同时,可使用cluster by简写

2.8 SORT BY、DISTRIBUTE BY、CLUSTER BY举例

 

 

 

3. 关联查询

3.1 Hive支持的关联查询

  • INNER JOIN
  • LEFT OUTER JOIN
  • RIGHT OUTER JOIN
  • FULL OUTER JOIN
  • LEFT SEMI-JOIN
  • Map-side Joins
  • 仅支持等值连接,不支持不等值连接

实例:

hive> 
    > 
    > SELECT  w.id  FROM word w join my_word m on w.id = m.id;
Query ID = hadoop_20150310022828_c826a379-81d7-4d8b-a299-3f163ee4079a
Total jobs = 1
15/03/10 02:28:37 WARN conf.Configuration: file:/home/hadoop/software/apache-hive-0.14.0-bin/iotmp/9a4f11ed-42a4-44cc-a405-2bcd87bce0b7/hive_2015-03-10_02-28-14_555_4164322138343464793-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
15/03/10 02:28:37 WARN conf.Configuration: file:/home/hadoop/software/apache-hive-0.14.0-bin/iotmp/9a4f11ed-42a4-44cc-a405-2bcd87bce0b7/hive_2015-03-10_02-28-14_555_4164322138343464793-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/software/hadoop-2.5.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/software/apache-hive-0.14.0-bin/lib/hive-jdbc-0.14.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Execution log at: /tmp/hadoop/hadoop_20150310022828_c826a379-81d7-4d8b-a299-3f163ee4079a.log
2015-03-10 02:28:41	Starting to launch local task to process map join;	maximum memory = 477102080
2015-03-10 02:28:49	Dump the side-table for tag: 1 with group count: 3 into file: file:/home/hadoop/software/apache-hive-0.14.0-bin/iotmp/9a4f11ed-42a4-44cc-a405-2bcd87bce0b7/hive_2015-03-10_02-28-14_555_4164322138343464793-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile11--.hashtable
2015-03-10 02:28:49	Uploaded 1 File to: file:/home/hadoop/software/apache-hive-0.14.0-bin/iotmp/9a4f11ed-42a4-44cc-a405-2bcd87bce0b7/hive_2015-03-10_02-28-14_555_4164322138343464793-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile11--.hashtable (320 bytes)
2015-03-10 02:28:49	End of local task; Time Taken: 7.816 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1425868733189_0004, Tracking URL = http://hadoop.master:8088/proxy/application_1425868733189_0004/
Kill Command = /home/hadoop/software/hadoop-2.5.2/bin/hadoop job  -kill job_1425868733189_0004
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2015-03-10 02:29:17,976 Stage-3 map = 0%,  reduce = 0%
2015-03-10 02:29:32,438 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 3.33 sec
MapReduce Total cumulative CPU time: 3 seconds 330 msec
Ended Job = job_1425868733189_0004
MapReduce Jobs Launched: 
Stage-Stage-3: Map: 1   Cumulative CPU: 3.33 sec   HDFS Read: 254 HDFS Write: 13 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 330 msec
OK
1
10
10
1000
Time taken: 80.261 seconds, Fetched: 4 row(s)

3.2 Map side Join

  • Join操作在map task中完成,因此无需启动reduce task;
  • 适合一个大表,一个小表的连接操作
  • 思想:小表复制到各个节点上,并加载到内存中;大表分片,与小表完成连接操作

3.3 Reduce side Join

  • 适合两个大表连接操作
  • 思想:map端按照连接字段进行hash,reduce 端完成连接操作

举例:

SELECT /*+ MAPJOIN(b) */ a.key, a.value FROM a join b on a.key = b.key

3.4 LEFT SEMI-JOIN(左半连接)

select word.id from word left semi join my_word on (word.id=my_word.id); 

 实例:

hive> select word.id from word left semi join my_word on (word.id=my_word.id); 
Query ID = hadoop_20150310020606_41b5d13c-a83e-4878-823c-d9911d0c274b
Total jobs = 1
15/03/10 02:08:54 WARN conf.Configuration: file:/home/hadoop/software/apache-hive-0.14.0-bin/iotmp/9a4f11ed-42a4-44cc-a405-2bcd87bce0b7/hive_2015-03-10_02-06-52_379_8334166551786931789-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
15/03/10 02:08:54 WARN conf.Configuration: file:/home/hadoop/software/apache-hive-0.14.0-bin/iotmp/9a4f11ed-42a4-44cc-a405-2bcd87bce0b7/hive_2015-03-10_02-06-52_379_8334166551786931789-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/software/hadoop-2.5.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/software/apache-hive-0.14.0-bin/lib/hive-jdbc-0.14.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Execution log at: /tmp/hadoop/hadoop_20150310020606_41b5d13c-a83e-4878-823c-d9911d0c274b.log
2015-03-10 02:09:34	Starting to launch local task to process map join;	maximum memory = 477102080
2015-03-10 02:09:42	Dump the side-table for tag: 1 with group count: 3 into file: file:/home/hadoop/software/apache-hive-0.14.0-bin/iotmp/9a4f11ed-42a4-44cc-a405-2bcd87bce0b7/hive_2015-03-10_02-06-52_379_8334166551786931789-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile01--.hashtable
2015-03-10 02:09:43	Uploaded 1 File to: file:/home/hadoop/software/apache-hive-0.14.0-bin/iotmp/9a4f11ed-42a4-44cc-a405-2bcd87bce0b7/hive_2015-03-10_02-06-52_379_8334166551786931789-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile01--.hashtable (316 bytes)
2015-03-10 02:09:43	End of local task; Time Taken: 8.098 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1425868733189_0003, Tracking URL = http://hadoop.master:8088/proxy/application_1425868733189_0003/
Kill Command = /home/hadoop/software/hadoop-2.5.2/bin/hadoop job  -kill job_1425868733189_0003
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2015-03-10 02:12:42,201 Stage-3 map = 0%,  reduce = 0%
2015-03-10 02:13:42,866 Stage-3 map = 0%,  reduce = 0%
2015-03-10 02:14:17,089 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 13.16 sec
MapReduce Total cumulative CPU time: 13 seconds 160 msec
Ended Job = job_1425868733189_0003
MapReduce Jobs Launched: 
Stage-Stage-3: Map: 1   Cumulative CPU: 13.16 sec   HDFS Read: 254 HDFS Write: 10 SUCCESS
Total MapReduce CPU Time Spent: 13 seconds 160 msec
OK
1
10
1000
Time taken: 451.347 seconds, Fetched: 3 row(s)

猜你喜欢

转载自bit1129.iteye.com/blog/2191001