Hive 静态分区&动态分区(含常见错误)

静态分区

1.创建分区表

hive (wzj)> CREATE TABLE emp_partition(
          > empno int,
          > ename string,
          > job string,
          > mgr int,
          > hiredate string,
          > sal double,
          > comm double
          > ) 
          > PARTITIONED BY (deptno int)
          > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
OK
Time taken: 0.099 seconds
hive (wzj)> 

2.导入数据

常见错误:列数不符,因为我们建的分区表中只有7个字段,而emp中有8个字段

hive (wzj)> INSERT OVERWRITE TABLE emp_partition PARTITION (deptno=10) select * from emp where deptno=10;
FAILED: SemanticException [Error 10044]: Line 1:23 Cannot insert into target table because column number/types are different '10': Table insclause-0 has 7 columns, but query has 8 columns.
hive (wzj)> 

正确做法:

hive (wzj)> INSERT OVERWRITE TABLE emp_partition PARTITION (deptno=10) select empno,ename,job,mgr,hiredate,sal,comm   from emp where deptno=10;
Query ID = wzj_20191225145959_6edf4344-c6eb-44c4-aa7c-58ea021621f5
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1577150260007_0005, Tracking URL = http://hadoop001:38088/proxy/application_1577150260007_0005/
Kill Command = /home/wzj/app/hadoop/bin/hadoop job  -kill job_1577150260007_0005
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-12-25 14:59:32,788 Stage-1 map = 0%,  reduce = 0%
2019-12-25 14:59:41,973 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.38 sec
MapReduce Total cumulative CPU time: 2 seconds 380 msec
Ended Job = job_1577150260007_0005
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop001:9000/user/hive/warehouse/wzj.db/emp_partition/deptno=10/.hive-staging_hive_2019-12-25_14-59-20_585_5888364998337612461-1/-ext-10000
Loading data to table wzj.emp_partition partition (deptno=10)
Partition wzj.emp_partition{deptno=10} stats: [numFiles=1, numRows=3, totalSize=130, rawDataSize=127]
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.38 sec   HDFS Read: 5703 HDFS Write: 214 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 380 msec
OK
empno	ename	job	mgr	hiredate	sal	comm
Time taken: 24.067 seconds
hive (wzj)> select * from emp_partition where deptno=10;
OK
emp_partition.empno	emp_partition.ename	emp_partition.job	emp_partition.mgr	emp_partition.hiredate	emp_partition.sal	emp_partition.comm	emp_partition.deptno
7782	CLARK	MANAGER	7839	1981-6-9	2450.0	NULL	10
7839	KING	PRESIDENT	NULL	1981-11-17	5000.0	NULL	10
7934	MILLER	CLERK	7782	1982-1-23	1300.0	NULL	10
Time taken: 0.152 seconds, Fetched: 3 row(s)

如果分区比较多,手动去一个个插入显然是不现实的,所以了解一下动态分区

动态分区

需要注意两点:

  • 动态分区要设置严格模式
  • 分区字段要写在查询语句的最后一个
  1. 创建一个分区表
hive (wzj)> CREATE TABLE emp_dynamic_partition(
          > empno int,
          > ename string,
          > job string,
          > mgr int,
          > hiredate string,
          > sal double,
          > comm double
          > ) 
          > PARTITIONED BY (deptno int)
          > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
OK
Time taken: 0.051 seconds
  1. 导入数据

常见错误:需要指定为非严格模式
hive (wzj)> INSERT OVERWRITE TABLE emp_dynamic_partition PARTITION (deptno) select empno,ename,job,mgr,hiredate,sal,comm,deptno from emp;
FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict

hive (wzj)> set hive.exec.dynamic.partition.mode=nonstrict;

hive (wzj)> INSERT OVERWRITE TABLE emp_dynamic_partition PARTITION (deptno) select empno,ename,job,mgr,hiredate,sal,comm,deptno from emp;
Query ID = wzj_20191225150808_7a52fc9e-6e43-4261-b0ac-a40ed2a332fe
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1577150260007_0006, Tracking URL = http://hadoop001:38088/proxy/application_1577150260007_0006/
Kill Command = /home/wzj/app/hadoop/bin/hadoop job  -kill job_1577150260007_0006
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-12-25 15:08:33,570 Stage-1 map = 0%,  reduce = 0%
2019-12-25 15:08:42,569 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.41 sec
MapReduce Total cumulative CPU time: 1 seconds 410 msec
Ended Job = job_1577150260007_0006
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop001:9000/user/hive/warehouse/wzj.db/emp_dynamic_partition/.hive-staging_hive_2019-12-25_15-08-23_320_6576263200155603056-1/-ext-10000
Loading data to table wzj.emp_dynamic_partition partition (deptno=null)
	 Time taken for load dynamic partitions : 559
	Loading partition {deptno=10}
	Loading partition {deptno=__HIVE_DEFAULT_PARTITION__}
	Loading partition {deptno=20}
	Loading partition {deptno=30}
	 Time taken for adding to write entity : 1
Partition wzj.emp_dynamic_partition{deptno=10} stats: [numFiles=1, numRows=3, totalSize=130, rawDataSize=127]
Partition wzj.emp_dynamic_partition{deptno=20} stats: [numFiles=1, numRows=5, totalSize=214, rawDataSize=209]
Partition wzj.emp_dynamic_partition{deptno=30} stats: [numFiles=1, numRows=6, totalSize=275, rawDataSize=269]
Partition wzj.emp_dynamic_partition{deptno=__HIVE_DEFAULT_PARTITION__} stats: [numFiles=1, numRows=1, totalSize=44, rawDataSize=43]
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.41 sec   HDFS Read: 5589 HDFS Write: 943 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 410 msec
OK
empno	ename	job	mgr	hiredate	sal	comm	deptno
Time taken: 21.61 seconds

hive (wzj)> show partitions emp_dynamic_partition;
OK
partition
deptno=10
deptno=20
deptno=30
deptno=HIVE_DEFAULT_PARTITION
Time taken: 0.081 seconds, Fetched: 4 row(s)
hive (wzj)>

如上,每个分区都已成功创建,且数据正常加载

发布了45 篇原创文章 · 获赞 1 · 访问量 1764

猜你喜欢

转载自blog.csdn.net/wzj_wp/article/details/103699228