Impala--impala-shell、java通过jdbc连接impala

查询处理接口

要处理查询，Impala提供了三个接口，如下所示。

1、Impala-shell - 使用Cloudera VM设置Impala后，可以通过在编辑器中键入impala-shell命令来启动Impala shell。我们将在后续章节中更多地讨论Impala shell。

2、Hue界面 - 您可以使用Hue浏览器处理Impala查询。在Hue浏览器中，您有Impala查询编辑器，您可以在其中键入和执行impala查询。要访问此编辑器，首先，您需要登录到Hue浏览器

3、ODBC / JDBC驱动程序 - 与其他数据库一样，Impala提供ODBC / JDBC驱动程序。使用这些驱动程序，您可以通过支持这些驱动程序的编程语言连接到impala，并构建使用这些编程语言在impala中处理查询的应用程序。

查询执行过程

1、每当用户使用提供的任何接口传递查询时，集群中的Impalads之一就会接受该查询。此Impalad被视为该特定查询的协调程序。

Impalad将query解析为具体的执行计划Planner, 交给当前机器Coordinator即为中心协调节点

Coordinator(中心协调节点)根据执行计划Planner，通过本机Executor执行，并转发给其它有数据的impalad用Executor进行执行

2、在接收到查询后，查询协调器使用Hive元存储中的表模式验证查询是否合适。稍后，它从HDFS名称节点收集关于执行查询所需的数据的位置的信息，并将该信息发送到其他impalad以便执行查询。

3、所有其他Impala守护程序读取指定的数据块并处理查询。一旦所有守护程序完成其任务，查询协调器将收集结果并将其传递给用户。

各个impalad的Executor执行完成后，将结果返回给中心协调节点，中心节点Coordinator将汇聚的查询结果返回给客户端

三、impala-shell用法、操作hive数据实例

[root@hadoop06 ~]# impala-shell -h

下面是Impala的外部Shell的一些参数：
-h : （--help）帮助
-v : （--version）查询版本信息
-p : 显示执行计划
-k : （--kerberos）使用kerberos安全加密方式运行impala-shell
-l : 启用LDAP认证
-u : 启用LDAP时，指定用户名

-i : hostname （--impalad=hostname）指定连接主机格式hostname：port 默认端口21000,
   impalad shell 默认连接本机impalad
-q : query 指定查询的sql语句从命令行执行查询，不进入impala-shell
-d : default_db （--database=default_db）指定数据库
-B ：（--delimited）去格式化输出,格式化输出* 大量数据加入格式化，性能受到影响
--output_delimiter=character （指定分隔符与其他命令整合，默认是\t分割）
--print_header 打印列名（去格式化，但是显示列名字，默认不打印）
-f : query_file后跟查询文件（--query_file=query_file）执行查询文件，以分号分隔
   建议sql 语句写到一行，因为shell 会读取文件一行一行的命令
-o : filename （--output_file filename）结果输出到指定文件
-c : 查询执行失败时继续执行
-r : 刷新所有元数据(当hive创建表的时候，你需要刷新到，才能看到hive元数据的改变)
   整体刷新，全量刷新，万不得已才能用；
   不建议定时去刷新hive源数据，数据量太大时候，一个刷新，很有可能会挂掉；

hive> select * from weather.weather_everydate_detail limit 10;
OK
WOCE_P10	1993	279.479	-16.442	172.219	24.9544	34.8887	1.0035	363.551	2
WOCE_P10	1993	279.48	-16.44	172.214	24.9554	34.8873	1.0035	363.736	2
WOCE_P10	1993	279.48	-16.439	172.213	24.9564	34.8868	1.0033	363.585	2
WOCE_P10	1993	279.481	-16.438	172.209	24.9583	34.8859	1.0035	363.459	2
WOCE_P10	1993	279.481	-16.437	172.207	24.9594	34.8859	1.0033	363.543	2
WOCE_P10	1993	279.481	-16.436	172.205	24.9604	34.8858	1.0035	363.432	2
WOCE_P10	1993	279.489	-16.417	172.164	24.9743	34.8867	1.0036	362.967	2
WOCE_P10	1993	279.49	-16.414	172.158	24.9742	34.8859	1.0035	362.96	2
WOCE_P10	1993	279.491	-16.412	172.153	24.9747	34.8864	1.0033	362.998	2
WOCE_P10	1993	279.492	-16.411	172.148	24.9734	34.8868	1.0031	363.022	2
Time taken: 0.815 seconds, Fetched: 10 row(s)

hive> select count(*) from weather.weather_everydate_detail;
Query ID = root_20171214185454_c783708d-ad4b-46cc-9341-885c16a286fe
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1512525269046_0001, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1512525269046_0001/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1512525269046_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-12-14 18:55:27,386 Stage-1 map = 0%,  reduce = 0%
2017-12-14 18:56:11,337 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 39.36 sec
2017-12-14 18:56:18,711 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 41.88 sec
MapReduce Total cumulative CPU time: 41 seconds 880 msec
Ended Job = job_1512525269046_0001
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 41.88 sec   HDFS Read: 288541 HDFS Write: 5 SUCCESS
Total MapReduce CPU Time Spent: 41 seconds 880 msec
OK
4018
Time taken: 101.82 seconds, Fetched: 1 row(s)

1、启动Impala CLI

[root@quickstart cloudera] # impala-shell 
Starting Impala Shell……

2、在Impala中同步元数据

[quickstart.cloudera:21000] > INVALIDATE METADATA;
Query: invalidate METADATA
Query submitted at: 2017-12-14 19:01:12 (Coordinator: http://quickstart.cloudera:25000)
Query progress can be monitored at: http://quickstart.cloudera:25000/query_plan?query_id=43460ace5d3a9971:9a50f46600000000
 
Fetched 0 row(s) in 3.25s

3、在Impala中查看Hive中表的结构

[quickstart.cloudera:21000] > use weather;
Query: use weather
[quickstart.cloudera:21000] > desc weather.weather_everydate_detail;
Query: describe weather.weather_everydate_detail
+---------+--------+---------+
| name    | type   | comment |
+---------+--------+---------+
| section | string |         |
| year    | bigint |         |
| date    | double |         |
| latim   | double |         |
| longit  | double |         |
| sur_tmp | double |         |
| sur_sal | double |         |
| atm_per | double |         |
| xco2a   | double |         |
| qf      | bigint |         |
+---------+--------+---------+
Fetched 10 row(s) in 3.70s

4、查询记录数量

[quickstart.cloudera:21000] > select count(*) from weather.weather_everydate_detail;

Query: select count(*) from weather.weather_everydate_detail
Query submitted at: 2017-12-14 19:03:11 (Coordinator: http://quickstart.cloudera:25000)
Query progress can be monitored at: http://quickstart.cloudera:25000/query_plan?query_id=5542894eeb80e509:1f9ce37f00000000
+----------+
| count(*) |
+----------+
| 4018     |
+----------+
Fetched 1 row(s) in 2.51s

说明：对比Impala与Hive中的count查询，2.15 VS 101.82，Impala的优势还是相当明显的

5、执行一个普通查询

[quickstart.cloudera:21000] > select * from weather_everydate_detail where sur_sal=34.8105;

Query: select * from weather_everydate_detail where sur_sal=34.8105
Query submitted at: 2017-12-14 19:20:27 (Coordinator: http://quickstart.cloudera:25000)
Query progress can be monitored at: http://quickstart.cloudera:25000/query_plan?query_id=c14660ed0bda471f:d92fcf0e00000000
+----------+------+---------+--------+---------+---------+---------+---------+---------+----+
| section  | year | date    | latim  | longit  | sur_tmp | sur_sal | atm_per | xco2a   | qf |
+----------+------+---------+--------+---------+---------+---------+---------+---------+----+
| WOCE_P10 | 1993 | 312.148 | 34.602 | 141.951 | 24.0804 | 34.8105 | 1.0081  | 361.29  | 2  |
| WOCE_P10 | 1993 | 312.155 | 34.602 | 141.954 | 24.0638 | 34.8105 | 1.0079  | 360.386 | 2  |
+----------+------+---------+--------+---------+---------+---------+---------+---------+----+
Fetched 2 row(s) in 0.25s

[quickstart.cloudera:21000] > select * from weather_everydate_detail where sur_tmp=24.0804;

Query: select * from weather_everydate_detail where sur_tmp=24.0804
Query submitted at: 2017-12-14 23:15:32 (Coordinator: http://quickstart.cloudera:25000)
Query progress can be monitored at: http://quickstart.cloudera:25000/query_plan?query_id=774e2b3b81f4eed7:8952b5b400000000
+----------+------+---------+--------+---------+---------+---------+---------+--------+----+
| section  | year | date    | latim  | longit  | sur_tmp | sur_sal | atm_per | xco2a  | qf |
+----------+------+---------+--------+---------+---------+---------+---------+--------+----+
| WOCE_P10 | 1993 | 312.148 | 34.602 | 141.951 | 24.0804 | 34.8105 | 1.0081  | 361.29 | 2  |
+----------+------+---------+--------+---------+---------+---------+---------+--------+----+
Fetched 1 row(s) in 3.86s

6.结论

对于Hive中需要编译为mapreduce执行的SQL，在Impala中执行是有明显的速度优势的，但是Hive也不是所有的查询都要编译为mapreduce，此类型的查询，impala相比于Hive就没啥优势了。

二、java通过jdbc连接impala

public class test_jdbc {
    public static void test(){
        Connection con = null;
        ResultSet rs = null;
        PreparedStatement ps = null;
        String JDBC_DRIVER = "com.cloudera.impala.jdbc41.Driver";
        String CONNECTION_URL = "jdbc:impala://192.168.2.20:21050";

        try
        {
            Class.forName(JDBC_DRIVER);
            con = (Connection) DriverManager.getConnection(CONNECTION_URL);
            ps = con.prepareStatement("select count(*) from billdetail;");
            rs = ps.executeQuery();
            while (rs.next())
            {
                System.out.println(rs.getString(1) );
            }
        } catch (Exception e)
        {
            e.printStackTrace();
        } finally
        {
            try {
                rs.close();
                ps.close();
                con.close();
            } catch (SQLException e) {
                e.printStackTrace();
            }
        }
    }
    public static void main(String[] args) {
        test();
    }
}