DDL数据定义(二)表的分区

版权声明:有一种生活不去经历不知其中艰辛,有一种艰辛不去体会,不会知道其中快乐,有一种快乐,没有拥有不知其中纯粹 https://blog.csdn.net/wwwzydcom/article/details/84192914

分区表

原理:

分区表实际上就是对应一个HDFS文件系统上的独立的文件夹,该文件夹下是该分区所有的数据文件,Hive中的分区就是分目录,把一个大的数据集根据业务需要分割成小的数据集,在查询时,通过WHERE子句中表达式选择查询需要的指定分区,这样查询的效率高很多

分区表的基本操作

  1. 引入分区表(根据日期对日志进行管理)

/user/hive/warehouse/log_partition/20170702/20170702.log
/user/hive/warehouse/log_partition/20170703/20170703.log
/user/hive/warehouse/log_partition/20170704/20170704.log

  1. 创建分区表的语法

     hive (default)> create table dept_partition(
                    deptno int, dname string, loc string
                    )
                    partitioned by (month string)
                    row format delimited fields terminated by '\t';
    
  2. 加载数据到分区表

      hive (hive)> load data local inpath '/opt/datas/dept.txt' into table dept_partition partition(month='201709');
      hive (hive)> load data local inpath '/opt/datas/dept.txt' into table dept_partition partition(month='201708');
      hive (hive)> load data local inpath '/opt/datas/dept.txt' into table dept_partition partition(month='201707');
    
  3. 查询分区表中的数据
    单分区查询

     select * from dept_partition where month='201709';
    

多分区联合查询(使用union联合多个select语句)

	select * from dept_partition where month='201709' union select * from dept_partition where month='201708';
  1. 增加分区
    创建单个分区

     hive (hive)> alter table dept_partition add partition(month='201705');
    

    同时创建多个分区

     hive (hive)> alter table dept_partition add partition(month='201704') partition(month='201703');
    
  2. 删除分区
    删除单个分区

     hive (hive)> alter table dept_partition drop partition(month='201705');
    

删除多个分区(注意分区间存在,号)

	hive (hive)> alter table dept_partition drop partition(month='201704') ,partition(month='201703');
  1. 查看分区表有多少分区

     hive (hive)> show partitions dept_partition;
     OK
     partition
     month=201706
     month=201707
     month=201708
     month=201709
    
  2. 查询分区表的结构

    hive (hive)> desc formatted dept_partition;
    

动态分区

1.开启动态分区

set hive.exec.dynamic.partition=true;

2.设置动态分区模式

set hive.exec.dynamic.partition.mode=nostrict;

默认是strict,表示必须指定至少一个分区为静态分区
nostrict模式允许所有的分区字段都可以使用动态分区

数据源

1,zshang,18,game-girl-book,stu_addr:beijing-work_addr:shanghai,2018-08-08
2,lishi,16,shop-boy-book,stu_addr:hunan-work_addr:shanghai,2018-08-09
3,wang2mazi,20,fangniu-eat,stu_addr:shanghai-work_addr:tianjing,2018-08-10
4,zshang,18,game-girl-book,stu_addr:beijing-work_addr:shanghai,2018-08-08
5,lishi,16,shop-boy-book,stu_addr:hunan-work_addr:shanghai,2018-08-09
6,wang2mazi,20,fangniu-eat,stu_addr:shanghai-work_addr:tianjing,2018-08-10

4.创建表

create table person1(
id int,
name string,
age int,
likes array<string>,
address map<string,string>,
dt string
)
row format delimited fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
lines terminated by '\n';

5.插入数据到表中

load data local inpath '/test/person.txt' into table person1;

6.创建分区表

create table datap(
id int,
name string,
age int,
likes array<string>,
address map<string,string>
)
partitioned by (dt string)
row format delimited fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
lines terminated by '\n';

7.插入数据

insert into datap partition(dt) select id,name,age,likes,address,dt from person1 distribute by dt;

8.查看分区信息
在这里插入图片描述

二级分区表

创建二级分区表(只是partitioned by加了字段)

create table dept_partition2(
deptno int,
dname string,
loc string
)
partitioned by (month string,day string)
row format delimited fields terminated by '\t';

正常的加载数据

load data local inpath '/opt/datas/dept.txt' 
into table dept_partition2 partition(month='201709',day='13');

查询分区数据

hive (hive)> select * from dept_partition2;

把数据直接上传到分区目录上,让分区表和数据产生关联的三种方式

方式一:上传数据后修复
上传数据

hive (hive)> dfs -mkdir -p /user/hive/warehouse/hive.db/dept_partition2/month=201709/day=12;
hive (hive)> dfs -put /opt/datas/dept.txt /user/hive/warehouse/hive.db/dept_partition2/month=201709/day=12;

查询数据(老版本的hive,查询不到刚上传的数据)

hive (hive)>  select * from dept_partition2 where month='201709' and day='12';
OK
dept_partition2.deptno	dept_partition2.dname	dept_partition2.loc	dept_partition2.montdept_partition2.day
Time taken: 2.766 seconds

执行修复命令

hive (hive)> msck repair table dept_partition2;

再次查询

hive (hive)>  select * from dept_partition2 where month='201709' and day='12';

方式二:上传数据后添加分区
上传数据

hive (default)> dfs -mkdir -p /user/hive/warehouse/dept_partition2/month=201709/day=11;
hive (default)> dfs -put /opt/module/datas/dept.txt  /user/hive/warehouse/dept_partition2/month=201709/day=11;

执行添加分区

hive (default)> alter table dept_partition2 add partition(month='201709', day='11');

查询数据

hive (default)> select * from dept_partition2 where month='201709' and day='11';

方式三:上传数据后load数据到分区

创建目录

hive (default)> dfs -mkdir -p /user/hive/warehouse/dept_partition2/month=201709/day=10;

上传数据

hive (default)> load data local inpath '/opt/module/datas/dept.txt' into table dept_partition2 partition(month='201709',day='10');

查询数据

hive (default)> select * from dept_partition2 where month='201709' and day='10';

猜你喜欢

转载自blog.csdn.net/wwwzydcom/article/details/84192914