Hive -分区表

1.创建一个分区表

hive (default)> create table order_partition(orderNumber string,event_time string)PARTITIONED BY(event_month string) row format delimited fields terminated by '\t';

2.把TXT文本上传至分区表中

hive (default)> load data local inpath '/home/hadoop/data/order.txt' into table order_partition PARTITION (event_month='2014-05');
（如果上面把TXT加载到表中的时候出现错误日志说什么too long等等，就登录到mysql）
mysql> use ruoze_d5;
mysql> alter table PARTITIONS convert to character set latin1;
mysql> alter table PARTITION_KEYS convert to character set latin1;

3.分区表的创建及内容加载进去之后，查看hdfs上的分区表

[hadoop@hadoop001 data]$ hdfs dfs -ls /user/hive/warehouse/order_partition
drwxr-xr-x   - hadoop supergroup          0 2018-11-09 14:51 /user/hive/warehouse/order_partition/event_month=2014-05
[hadoop@hadoop001 data]$ hdfs dfs -ls /user/hive/warehouse/order_partition/event_month=2014-05
-rwxr-xr-x   1 hadoop supergroup        208 2018-11-09 14:51 /user/hive/warehouse/order_partition/event_month=2014-05/order.txt
（注：以后见到event_month=2014-05这种类似的文件夹考虑到其就是分区表）

4.在3.中的分区表order_partition手动创建一个类似分区表的文件夹，与3.对比

[hadoop@hadoop001 data]$ hdfs dfs -mkdir -p /user/hive/warehouse/order_partition/event_month=2014-06
手工在这里面创建一个类似分区表的文件夹）
[hadoop@hadoop001 data]$ hdfs dfs -put order.txt /user/hive/warehouse/order_partition/event_month=2014-06
（再手动把order.txt文件传入创建的类似分区表的文件夹中）
hive (default)> select * from order_partition where event_month='2014-05';（分区表查询的时候要把分区条件带上，不然还是会在order_partition下面全局搜索）
10703007267488 2014-05-01 06:01:12.334+01 2014-05
10101043505096 2014-05-01 07:28:12.342+01 2014-05
10103043509747 2014-05-01 07:50:12.33+01 2014-05
10103043501575 2014-05-01 09:27:12.33+01 2014-05
10104043514061 2014-05-01 09:03:12.324+01 2014-05
hive (default)> select * from order_partition where  event_month='2014-06';（则显示为空，并没有数据。因为元数据并没有，当我们正常建立分区表的时候，用load加载文档的时候，它会自动刷新分区，而我们手动创建的分区表里没有元数据信息。）
hive (default)> msck repair table order_partition;（刷新order_partition分区表的分区）
Partitions not in metastore: order_partition:event_month=2014-06
Repair: Added partition to metastore order_partition:event_month=2014-06

hive (default)> select * from order_partition where event_month='2014-06';
10703007267488 2014-05-01 06:01:12.334+01 2014-06
10101043505096 2014-05-01 07:28:12.342+01 2014-06
10101043505096 2014-05-01 07:28:12.342+01 2014-06
10103043501575 2014-05-01 09:27:12.33+01 2014-06
10104043514061 2014-05-01 09:03:12.324+01 2014-06
（注：msck repair table order_partition 此命令不能用，它会刷所有的分区，性能很低！生产上杜绝使用此方法。用另一种方法来解决，如下：）
[hadoop@hadoop001 data]$ hdfs dfs -mkdir -p /user/hive/warehouse/order_partition/event_month=2014-07
[hadoop@hadoop001 data]$ hdfs dfs -put order.txt /user/hive/warehouse/order_partition/event_month=2014-07
hive (default)> alter table order_partition add partition(event_month='2014-07');（生产上一般使用这种方法进行添加分区里面的元数据）
hive (default)> select * from order_partition where event_month='2014-07';
10703007267488 2014-05-01 06:01:12.334+01 2014-07
10101043505096 2014-05-01 07:28:12.342+01 2014-07
10101043505096 2014-05-01 07:28:12.342+01 2014-07
10103043501575 2014-05-01 09:27:12.33+01 2014-07
10104043514061 2014-05-01 09:03:12.324+01 2014-07
hive (default)> show partitions order_partition;（查看order_partition表下面有哪些分区）
event_month=2014-05
event_month=2014-06
event_month=2014-07

5.创建一个多级分区表

hive (default)> create table order_mulit_partition(orderNumber string,event_time string)PARTITIONED BY(event_month string, step string)row format delimited fields terminated by '\t';
hive (default)> desc formatted order_mulit_partition;（查看分区表的详细信息）
hive (default)> load data local inpath '/home/hadoop/data/order.txt' into table order_mulit_partition PARTITION (event_month='2014-05',step='1'); （加载数据进去）
hive (default)> select *from order_mulit_partition where event_month='2014-05';
10703007267488 2014-05-01 06:01:12.334+01 2014-05 1
10101043505096 2014-05-01 07:28:12.342+01 2014-05 1
10103043509747 2014-05-01 07:50:12.33+01 2014-05 1
10103043501575 2014-05-01 09:27:12.33+01 2014-05 1
10104043514061 2014-05-01 09:03:12.324+01 2014-05 1
[hadoop@hadoop001 data]$ hdfs dfs -ls /user/hive/warehouse/order_mulit_partition/event_month=2014-05
drwxr-xr-x   - hadoop supergroup          0 2018-11-09 16:09 /user/hive/warehouse/order_mulit_partition/event_month=2014-05/step=1 
（此时order_mulit_partition是两个分区。hdfs上面一个分区对应一个目录）

小结：上面的单级分区/多级分区 ==> 统称为静态分区。（静态分区指在指定分区段时候，一定要把写全了，就是event_month step都要写出来）

hive (default)> show create table ruoze_emp;（查看当时创建ruoze_emp表的语句）
CREATE TABLE `ruoze_emp_partition`(`empno` int, `ename` string, `job` string,`mgr` int, `hiredate` string, `sal` double, `comm` double) partitioned by(`deptno` int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
（**注**：分区的字段不能出现在表的结构的字段中。当用deptno作为分区字段时候就把表结构中的deptno这一项去掉了）

6.静态分区与动态分区
问题：请按照ruoze_emp表中的部门编号deptno字段对该表进行分区，写到分区表里）
方法一：按照每个部门编号10、20、30分别写到分区表里面

hive (default)> insert into table ruoze_emp_partition PARTITION(deptno=10) select empno,ename,job,mgr,hiredate,sal,comm from ruoze_emp where deptno=10;

假设：有1000个deptno （那么此时再按照方法一去单个添加不现实，这就是静态分区的弊端）
方法二：采用动态分区的方法

hive (default)> insert overwrite table ruoze_emp_partition PARTITION(deptno)
select empno,ename,job,mgr,hiredate,sal,comm,deptno from ruoze_emp;
FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict
（报错了，因为默认都是严格的静态模式，按照控制台给的提示，执行set hive.exec.dynamic.partition.mode=nonstrict）
hive (default)> set hive.exec.dynamic.partition.mode=nonstrict;（如果想要全局使用的话就到hive site里面去配置）

和方法一静态分区的对比如下：注：1、在指定分区段PARTITION(deptno)这里不给deptno赋具体值 2、把分区字段deptno加在select语句的最后 3、最后就不用where指定ruoze_emp中的具体deptno部分编号）

hive (default)> select *from ruoze_emp_partition;

[hadoop@hadoop001 data]$ hdfs dfs -ls /user/hive/warehouse/ruoze_emp_partition
drwxr-xr-x   - hadoop supergroup          0 2018-11-09 17:08 /user/hive/warehouse/ruoze_emp_partition/deptno=10
drwxr-xr-x   - hadoop supergroup          0 2018-11-09 17:07 /user/hive/warehouse/ruoze_emp_partition/deptno=20
drwxr-xr-x   - hadoop supergroup          0 2018-11-09 17:07 /user/hive/warehouse/ruoze_emp_partition/deptno=30
drwxr-xr-x   - hadoop supergroup          0 2018-11-09 17:07 /user/hive/warehouse/ruoze_emp_partition/deptno=__HIVE_DEFAULT_PARTITION__

猜你喜欢