1.管理表

--建表关键字

1.创建表并指定字段之间的分隔符

row format delimited fields terminated by '\t'

create table if not exists stu2(id int ,name string) row format delimited fields terminated by '\t' stored as textfile location '/user/stu2';

2.根据查询结果创建表

create table stu3 as select * from stu2;

3.根据已经存在的表结构创建表

like

create table stu4 like stu2;

4.外部表

外部表因为是指定其他的hdfs路径的数据加载到表当中来，所以hive表会认为自己不完全独占这份数据，所以删除hive表的时候，数据仍然存放在hdfs当中，不会删掉

external

create external table techer (t_id string,t_name string) row format delimited fields terminated by '\t';

如果删掉表，hdfs的数据仍然存在，并且重新创建表之后，表中就直接存在数据了,因为我们的表使用的是外部表，drop table之后，表当中的数据依然保留在hdfs上面了

5.分区表

在大数据中，最常用的一种思想就是分治，我们可以把大的文件切割划分成一个个的小的文件，这样每次操作一个小的文件就会很容易了，同样的道理，在hive当中也是支持这种思想的，就是我们可以把大的数据，按照每天，或者每小时进行切分成一个个的小的文件，这样去操作小的文件就会容易得多了

partitioned by （）

创建分区表语法

create table score(s_id string,c_id string, s_score int) partitioned by (month string) row format delimited fields terminated by '\t';

、创建一个表带多个分区

create table score2 (s_id string,c_id string, s_score int) partitioned by (year string,month string,day string) row format delimited fields terminated by '\t';

加载数据到分区表中

load data local inpath '/export/servers/hivedatas/score.csv' into table score partition (month='201806');

查看分区

show partitions score;

添加一个分区

alter table score add partition(month='201805');

注意：添加分区之后就可以在hdfs文件系统当中看到表下面多了一个文件夹

6.分桶表

将数据按照指定的字段进行分成多个桶中去，说白了就是将数据按照字段进行划分，可以将数据按照字段划分到多个文件当中去

开启hive的桶表功能

set hive.enforce.bucketing=true;

设置reduce的个数

set mapreduce.job.reduces=3;

创建桶表

create table course (c_id string,c_name string,t_id string) clustered by(c_id) into 3 buckets row format delimited fields terminated by '\t';

桶表的数据加载，只能通过insert overwrite。hdfs dfs -put文件或者通过load data无法加载

创建普通表，并通过insert overwrite的方式将普通表的数据通过查询的方式加载到桶表当中去

7.hive表中加载数据

有5种方式

1 直接向分区表中插入数据

insert into table score3 partition(month ='201807') values ('001','002','100');

2、通过查询插入数据

insert overwrite table score4 partition(month = '201806') select s_id,c_id,s_score from score;

3、多插入模式

from score

insert overwrite table score_first partition(month='201806') select s_id,c_id

insert overwrite table score_second partition(month = '201806') select c_id,s_score;

4、查询语句中创建表并加载数据

create table score5 as select * from score;

5、创建表时通过location指定加载数据路径

create external table score6 (s_id string,c_id string,s_score int) row format delimited fields terminated by '\t' location '/myscore6';

8.hive表中的数据导出

有7种方式

1 将查询的结果导出到本地

insert overwrite local directory '/export/servers/exporthive/a' select * from score;

2 将查询的结果格式化导出到本地

insert overwrite local directory '/export/servers/exporthive' row format delimited fields terminated by '\t' collection items terminated by '#' select * from student;

3、将查询的结果导出到HDFS上(没有local)

insert overwrite directory '/export/servers/exporthive' row format delimited fields terminated by '\t' collection items terminated by '#' select * from score;

4、Hadoop命令导出到本地

dfs -get /export/servers/exporthive/000000_0 /export/servers/exporthive/local.txt;

5、hive shell 命令导出

bin/hive -e "select * from myhive.score;" > /export/servers/exporthive/score.txt

6、 export导出到HDFS上 export table score to '/export/exporthive/score';

7、 sqoop 导出数据

kismetG

发布了80 篇原创文章 · 获赞 168 · 访问量 8万+

私信关注

HIVE --管理表（表分隔符，外部表，分区表，分桶表，hive载入数据，导出数据）