beeline 连接SPARK /Hive

hiveclient所在主机的jdk 1.7_51，hive 0.12和hadoop 2.3.0是从服务器端拷贝过来的，环境变量一切OK.
执行连接报了Invalid URL的错误：
$ beeline
Beeline version 0.12.0 by Apache Hive
beeline> !connect jdbc:hive2://cloud011:10000
scan complete in 2ms
Connecting to jdbc:hive2://cloud011:10000
Enter username for jdbc:hive2://cloud011:10000:
Enter password for jdbc:hive2://cloud011:10000:
Error: Invalid URL: jdbc:hive2://cloud011:10000 (state=08S01,code=0)

开始的一段时间都在纠结这个jdbc的URL格式问题，后来在cloudra论坛上找到了一个方法,
直接调用的jdbc:hive2的驱动测试是正常的，证明CLASSPATH等环境变量没有问题。

这时候感觉很可能不是客户端的问题，矛头指向服务器端：

发现绑定的主机地址是localhost，而localhost的地址是127.0.0.1。这应该就是问题所在，从服务器本地测试：

连接成功！

下面就要把参数改一下，然后重启服务

重启服务后检查监听地址，这次是正确的了。

再次在客户端主机上测试连接：

成功。

Thrift JDBC Server描述

Thrift JDBC Server使用的是HIVE0.12的HiveServer2实现。能够使用Spark或者hive0.12版本的beeline脚本与JDBC Server进行交互使用。Thrift JDBC Server默认监听端口是10000。

使用Thrift JDBC Server前需要注意：

1、将hive-site.xml配置文件拷贝到$SPARK_HOME/conf目录下；

2、需要在$SPARK_HOME/conf/spark-env.sh中的SPARK_CLASSPATH添加jdbc驱动的jar包

export SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/Hadoop/software/mysql-connector-java-5.1.27-bin.jar

Thrift JDBC Server命令使用帮助：

cd $SPARK_HOME/sbin
start-thriftserver.sh --help

复制代码
Usage: ./sbin/start-thriftserver [options] [thrift server options]
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Options:
--master MASTER_URL spark://host:port, mesos://host:port, yarn, or local.
--deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or
on one of the worker machines inside the cluster ("cluster")
(Default: client).
--class CLASS_NAME Your application's main class (for Java / Scala apps).
--name NAME A name of your application.
--jars JARS Comma-separated list of local jars to include on the driver
and executor classpaths.
--py-files PY_FILES Comma-separated list of .zip, .egg, or .py files to place
on the PYTHONPATH for Python apps.
--files FILES Comma-separated list of files to be placed in the working
directory of each executor.

--conf PROP=VALUE Arbitrary Spark configuration property.
--properties-file FILE Path to a file from which to load extra properties. If not
specified, this will look for conf/spark-defaults.conf.

--driver-memory MEM Memory for driver (e.g. 1000M, 2G) (Default: 512M).
--driver-java-options Extra Java options to pass to the driver.
--driver-library-path Extra library path entries to pass to the driver.
--driver-class-path Extra class path entries to pass to the driver. Note that
jars added with --jars are automatically included in the
classpath.

--executor-memory MEM Memory per executor (e.g. 1000M, 2G) (Default: 1G).

--help, -h Show this help message and exit
--verbose, -v Print additional debug output

Spark standalone with cluster deploy mode only:
--driver-cores NUM Cores for driver (Default: 1).
--supervise If given, restarts the driver on failure.

Spark standalone and Mesos only:
--total-executor-cores NUM Total cores for all executors.

YARN-only:
--executor-cores NUM Number of cores per executor (Default: 1).
--queue QUEUE_NAME The YARN queue to submit to (Default: "default").
--num-executors NUM Number of executors to launch (Default: 2).
--archives ARCHIVES Comma separated list of archives to be extracted into the
working directory of each executor.

Thrift server options:
--hiveconf <property=value> Use value for given property

master的描述与Spark SQL CLI一致

beeline命令使用帮助：

cd $SPARK_HOME/bin
beeline --help

Thrift JDBC Server/beeline启动

启动Thrift JDBC Server：默认端口是10000

cd $SPARK_HOME/sbin
start-thriftserver.sh

如何修改Thrift JDBC Server的默认监听端口号？借助于--hiveconf

start-thriftserver.sh --hiveconf hive.server2.thrift.port=14000

HiveServer2 Clients 详情参见：https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients

启动beeline

cd $SPARK_HOME/bin
beeline -u jdbc:hive2://hadoop000:10000/default -n hadoop

sql脚本测试

SELECT track_time, url, session_id, referer, ip, end_user_id, city_id FROM page_views WHERE city_id = -1000 limit 10;
SELECT session_id, count(*) c FROM page_views group by session_id order by c desc limit 10;

beeline 连接SPARK /Hive

猜你喜欢