静态分区
1.创建分区表
hive (wzj)> CREATE TABLE emp_partition(
> empno int,
> ename string,
> job string,
> mgr int,
> hiredate string,
> sal double,
> comm double
> )
> PARTITIONED BY (deptno int)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
OK
Time taken: 0.099 seconds
hive (wzj)>
2.导入数据
常见错误:列数不符,因为我们建的分区表中只有7个字段,而emp中有8个字段
hive (wzj)> INSERT OVERWRITE TABLE emp_partition PARTITION (deptno=10) select * from emp where deptno=10;
FAILED: SemanticException [Error 10044]: Line 1:23 Cannot insert into target table because column number/types are different '10': Table insclause-0 has 7 columns, but query has 8 columns.
hive (wzj)>
正确做法:
hive (wzj)> INSERT OVERWRITE TABLE emp_partition PARTITION (deptno=10) select empno,ename,job,mgr,hiredate,sal,comm from emp where deptno=10;
Query ID = wzj_20191225145959_6edf4344-c6eb-44c4-aa7c-58ea021621f5
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1577150260007_0005, Tracking URL = http://hadoop001:38088/proxy/application_1577150260007_0005/
Kill Command = /home/wzj/app/hadoop/bin/hadoop job -kill job_1577150260007_0005
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-12-25 14:59:32,788 Stage-1 map = 0%, reduce = 0%
2019-12-25 14:59:41,973 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.38 sec
MapReduce Total cumulative CPU time: 2 seconds 380 msec
Ended Job = job_1577150260007_0005
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop001:9000/user/hive/warehouse/wzj.db/emp_partition/deptno=10/.hive-staging_hive_2019-12-25_14-59-20_585_5888364998337612461-1/-ext-10000
Loading data to table wzj.emp_partition partition (deptno=10)
Partition wzj.emp_partition{deptno=10} stats: [numFiles=1, numRows=3, totalSize=130, rawDataSize=127]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 2.38 sec HDFS Read: 5703 HDFS Write: 214 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 380 msec
OK
empno ename job mgr hiredate sal comm
Time taken: 24.067 seconds
hive (wzj)> select * from emp_partition where deptno=10;
OK
emp_partition.empno emp_partition.ename emp_partition.job emp_partition.mgr emp_partition.hiredate emp_partition.sal emp_partition.comm emp_partition.deptno
7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10
7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10
7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10
Time taken: 0.152 seconds, Fetched: 3 row(s)
如果分区比较多,手动去一个个插入显然是不现实的,所以了解一下动态分区
动态分区
需要注意两点:
- 动态分区要设置严格模式
- 分区字段要写在查询语句的最后一个
- 创建一个分区表
hive (wzj)> CREATE TABLE emp_dynamic_partition(
> empno int,
> ename string,
> job string,
> mgr int,
> hiredate string,
> sal double,
> comm double
> )
> PARTITIONED BY (deptno int)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
OK
Time taken: 0.051 seconds
- 导入数据
常见错误:需要指定为非严格模式
hive (wzj)> INSERT OVERWRITE TABLE emp_dynamic_partition PARTITION (deptno) select empno,ename,job,mgr,hiredate,sal,comm,deptno from emp;
FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict
hive (wzj)> set hive.exec.dynamic.partition.mode=nonstrict;
hive (wzj)> INSERT OVERWRITE TABLE emp_dynamic_partition PARTITION (deptno) select empno,ename,job,mgr,hiredate,sal,comm,deptno from emp;
Query ID = wzj_20191225150808_7a52fc9e-6e43-4261-b0ac-a40ed2a332fe
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1577150260007_0006, Tracking URL = http://hadoop001:38088/proxy/application_1577150260007_0006/
Kill Command = /home/wzj/app/hadoop/bin/hadoop job -kill job_1577150260007_0006
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-12-25 15:08:33,570 Stage-1 map = 0%, reduce = 0%
2019-12-25 15:08:42,569 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.41 sec
MapReduce Total cumulative CPU time: 1 seconds 410 msec
Ended Job = job_1577150260007_0006
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop001:9000/user/hive/warehouse/wzj.db/emp_dynamic_partition/.hive-staging_hive_2019-12-25_15-08-23_320_6576263200155603056-1/-ext-10000
Loading data to table wzj.emp_dynamic_partition partition (deptno=null)
Time taken for load dynamic partitions : 559
Loading partition {deptno=10}
Loading partition {deptno=__HIVE_DEFAULT_PARTITION__}
Loading partition {deptno=20}
Loading partition {deptno=30}
Time taken for adding to write entity : 1
Partition wzj.emp_dynamic_partition{deptno=10} stats: [numFiles=1, numRows=3, totalSize=130, rawDataSize=127]
Partition wzj.emp_dynamic_partition{deptno=20} stats: [numFiles=1, numRows=5, totalSize=214, rawDataSize=209]
Partition wzj.emp_dynamic_partition{deptno=30} stats: [numFiles=1, numRows=6, totalSize=275, rawDataSize=269]
Partition wzj.emp_dynamic_partition{deptno=__HIVE_DEFAULT_PARTITION__} stats: [numFiles=1, numRows=1, totalSize=44, rawDataSize=43]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 1.41 sec HDFS Read: 5589 HDFS Write: 943 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 410 msec
OK
empno ename job mgr hiredate sal comm deptno
Time taken: 21.61 seconds
hive (wzj)> show partitions emp_dynamic_partition;
OK
partition
deptno=10
deptno=20
deptno=30
deptno=HIVE_DEFAULT_PARTITION
Time taken: 0.081 seconds, Fetched: 4 row(s)
hive (wzj)>
如上,每个分区都已成功创建,且数据正常加载