hive初探

.下载 resource:

http://hbase.apache.org/book/quickstart.html
http://hive.apache.org/downloads.html

Hortonworks和hadoop ,hive,hbase对应版本：

HDP2.0.6/hadoop2.2.0 , hive 0.12.0, hbase 0.96.1
HDP1.3.3/hadoop1.2.0, ,hive 0.11.0, hbase 0.94.6
HDP2.1/hadoop 2.4.0, hive 0.13, hbase 0.98

HDP2.2/hadoop2.6.0, hive 0.14, hbase 0.98

Hive的安装和配置相当简单，只要从其网站（http://hive.apache.org/downloads.html）上下载并解压到装有hadoop的机器上。设置hadoop/bin的路径到系统PATH就可使hive工作。hive 自从0.11之后就同时支持hadoop 0.20系列和hadoop0.23系列

直接敲${hive-install}/bin/hive就可进入其shell

1. hive

1.1 show tables:
input hive in shell or input $HIVE_HOME/bin/hive in shell,
hive> show tables;

1.2 create tales:
input hive in shell or input $HIVE_HOME/bin/hive in shell,
hive> CREATE TABLE pokes (foo INT, bar STRING);

1.3 drop table:
input hive in shell or input $HIVE_HOME/bin/hive in shell,
hive> DROP TABLE pokes;

1.4 query tables:
input hive in shell or input $HIVE_HOME/bin/hive in shell,
hive> SELECT * FROM pokes p;

please note that hive is data warehouse, no insert into value(,,,) statement, but can insert/load data from other source.

1.5 execute hsql without entering hive shell cmd:
hive -e " show tables;" --or $HIVE_HOME/bin/hive -e " show tables;"
hive -e " select * from pokes" --or $HIVE_HOME/bin/hive -e " show tables;"

1.6 execute DDL/DML file without entering hive shell cmd:
hive -f hsql.ddl

1.7 load data

1) load data from key/value file seperated by ^A (ctl+a)

create table pokes1 (id int, name string)

pokes1.data:

1^Atony

2^ASmith

2) load data from key/value file seperated by '\t' (Tab)

CREATE TABLE pokes2 ( userid INT,name STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE;

pokes2.data:

36 Smith
40 Tony
64 Huang

===================sample hsql.ddl================================
; --------------Create hive hbase table -----------------

DROP TABLE if exists hbase_drug1n2row;

CREATE EXTERNAL TABLE hbase_drug1n2row(rowid STRING,age STRING,sex STRING,bp STRING,cholesterol STRING,na STRING,k STRING,drug STRING)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,drug:age,drug:sex,drug:bp,drug:cholesterol,drug:na,drug:k,drug:drug")
TBLPROPERTIES("hbase.table.name" = "drug1n2row");

2. hbase

2.1 enter shell:

./bin/hbase shell
hbase(main):001:0>

2.2 show all tables:

hbase> list

2.3show table of test :

hbase> list 'test'
TABLE
test
1 row(s) in 0.0350 seconds
----------------

2.4 create table with name of test and columnfamily name of cf

hbase> create 'test', 'cf'
0 row(s) in 1.2200 seconds
----------------
=> ["test"]

2.5 put data to table

1)put data to table, key is row1 value is value1:
hbase> put 'test', 'row1', 'cf:a', 'value1'
0 row(s) in 0.1770 seconds

2)put multiple data to table, exmaple1:

hbase> put 'test', 'row1', 'cf:a', 'value1'
0 row(s) in 0.1770 seconds
hbase> put 'test', 'row2', 'cf:b', 'value2'
0 row(s) in 0.0160 seconds
hbase> put 'test', 'row3', 'cf:c', 'value3'
0 row(s) in 0.0260 seconds

3) put multiple data to table ,example2:

hbase> put 'test2', 'row1', 'cf:a', 'value1'
0 row(s) in 0.1770 seconds
hbase> put 'test2', 'row1', 'cf:b', 'value2'
0 row(s) in 0.0160 seconds
hbase> put 'test2', 'row2', 'cf:a', 'value3'
0 row(s) in 0.0260 seconds

2.6 scan table for all data:

hbase>scan 'test'

2.7 disable/enable table:

hbase>disable 'test'

hbase>enable 'test'

2.8 drop table:

hbase>drop table 'test'

2.9 get specific row(s):

hbase>get 'test', 'row1'

3. hive,base 集成

可以从hive中访问hbase 的表；也可以在创建hive表时创建与之关联的hbase表（位于hbase数据库）；也可以创建基于已存在的hbase表的hive表。

3.1 从已存在的hbase表中创建hive表:

确保hbase表test已经在hbase中创建好，列族名为cf.

export HADOOP_CLASSPATH=/etc/hbase/conf:/usr/lib/hbase/*:/usr/lib/hbase/lib/*:/usr/lib/zookeeper/zookeeper.jar:$HADOOP_CLASSPATH

进入hive shell运行如下命令：

hive> create external table hbase_test (id STRING, value STRING) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping"=":key,cf:a,cf:b") TBLPROPERTIES("hbase.table.name"="test");
会报错： org.apache.hadoop.hive.hbase.HBaseSerDe: columns has 2 elements while hbase.columns.mapping has 3 elements 这是因为hbase_test申明为2列(id, value)而与hbase表的映射为3列(:key,cf:a,cf:b) .将:key,cf:a,cf:b改为:key,cf:a 即可：

create external table hbase_test (id STRING, value STRING) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping"=":key,cf:a") TBLPROPERTIES("hbase.table.name"="test");

这样会将hbase表(这里是test)的key(hbase put 语句中表名后紧跟着的就是key名，比如put 'test', 'row1', 'cf:a', 'value1' 语句说明此数据key是row1 )映射到hive表(这里是 hbase_test)的id列, 列族cf中a列映射到hive表的value列

4.常见问题：

1）java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/MasterNotRunningException
这通常是因为hbase-*.jar没有加入到 classpath ,进入hive 命令之前将对应包加入到HADOOP_CLASSPATH.

比如，HDP1.3.3就会报这个错，执行下面：export HADOOP_CLASSPATH=/etc/hbase/conf:/usr/lib/hbase/*:/usr/lib/hbase/lib/*:/usr/lib/zookeeper/zookeeper.jar:$HADOOP_CLASSPATH 再进hive命令即可。（这样就可以将hbase安装目录/usr/lib/hbase下的jar (hbase-0.94.6.1.3.3.0-58-security.jar)引入HADOOP_CLASSPATH）

2) java.lang.NoSuchMethodError: org.apache.thrift.EncodingUtils.setBit(BIZ)B

将${hbase}/lib下的libthrift-0.8.0.jar换成libthrift-0.9.0.jar(在$HIVE/lib下有)即可。

参考资料：

1。hbase总汇：http://hbase.apache.org/#

2。hbase快速指南： http://hbase.apache.org/book/quickstart.html

3. hive/hbase集成：https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration

4.hive总汇： https://cwiki.apache.org/confluence/display/Hive/Home

5.hive快速指南：https://cwiki.apache.org/confluence/display/Hive/GettingStarted

6. Hortonworks HDP hive/hbase教程：

http://hortonworks.com/blog/using-hive-to-interact-with-hbase-part-2/
http://hortonworks.com/community/forums/topic/hive-external-table-pointing-to-hbase-2/