HiveQL DDL—表

创建表

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name    -- (Note: TEMPORARY available in Hive 0.14.0 and later)
  [(col_name data_type [COMMENT col_comment], ... [constraint_specification])]
  [COMMENT table_comment]
  [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
  [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
  [SKEWED BY (col_name, col_name, ...)                  -- (Note: Available in Hive 0.10.0 and later)]
     ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
     [STORED AS DIRECTORIES]
  [
   [ROW FORMAT row_format] 
   [STORED AS file_format]
     | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]  -- (Note: Available in Hive 0.6.0 and later)
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]   -- (Note: Available in Hive 0.6.0 and later)
  [AS select_statement];   -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)

CREATE TABLE 创建指定名字的表。如果相同名字的表已经存在,则会报错;用户可以用 IF NOT EXISTS 选项跳过这个错误。表名和列名是大小不敏感的,而SerDe和属性名称是大小写敏感的。
EXTERNAL 关键字指定创建一个外部表,默认是内部表(也称为管理表),外部表在建表的必须同时指定一个指向实际数据的路径(LOCATION),Hive 创建内部表时,会将数据移动到数据仓库指向的路径;若创建外部表,仅记录数据所在的路径,不对数据的位置做任何改变。在删除表的时候,内部表的元数据和数据会被一起删除,而外部表只删除元数据,不删除数据。
TEMPORARY 关键字指定创建一个临时表,临时表只对当前的用户会话可见,会话结束后,表会被自动删除,临时表不支持分区和索引。
COMMENT 子句是给字段或者表添加注释信息的,注释信息使用单引号包括的。注释对于表的计算没有影响,但是为了后期的维护,所以实际开发都是必须要加COMMENT的。
PARTITIONED BY 是给表指定分区,决定了表是否是分区表。Hive中所谓分区表就是将表里新增加一个字段,就是分区的名字,这样你在操作表中的数据时,可以按分区字段进行过滤。
[ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’ ] 指定表字段的分隔符,默认是\001,这里指定的是逗号分隔符,还可以指定其他列的分隔符。
STORED AS SEQUENCEFILE|TEXTFILE|RCFILE 如果文件数据是纯文本,可以使用 STORED AS TEXTFILE。如果数据需要压缩,使用 STORED AS SEQUENCEFILE。
CLUSTERED BY 对于每一个表(table)或者分区, Hive可以进一步组织成桶,也就是说桶是更为细粒度的数据范围划分。Hive也针对某一列进行桶的组织。Hive采用对列值哈希,然后除以桶的个数求余的方式决定该条记录存放在哪个桶当中
LOCATION 是指定表在HDFS上的存储路径,内部表如果不指定,会使用默认路径,如果指定的路径不存在会自动创建;对于外部表,则需要直接指定一个路径。
TBLPROPERTIES 用以指定表的属性,创建表时会自动添加一些预定义的属性,如 last_modified_user 和 last_modified_time。

  创建示例数据文件

$ vim  /home/hadoop/employee.txt
Michael|Montreal,Toronto|Male,30|DB:80|Product:Developer^DLead
Will|Montreal|Male,35|Perl:85|Product:Lead,Test:Lead
Shelley|New York|Female,27|Python:80|Test:Lead,COE:Architect
Lucy|Vancouver|Female,57|Sales:89,HR:94|Sales:Lead

  示例: 创建内部表

> CREATE TABLE IF NOT EXISTS employee_internal (
name STRING COMMENT 'this is optinal column comments',
work_place ARRAY<STRING>,-- 字段是大小写不敏感的
gender_age STRUCT<gender:STRING,age:INT>,
skills_score MAP<STRING,INT>, 
depart_title MAP<STRING,ARRAY<STRING>> --最后一个字段没有","
)
COMMENT 'This is an internal table'-- 表的注释信息
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|' -- 字段分隔符
COLLECTION ITEMS TERMINATED BY ','-- 集合元素的分隔符
MAP KEYS TERMINATED BY ':' -- MAP类型key/value的分隔符
STORED as TEXTFILE; -- 表的文件格式

  加载数据

> LOAD DATA INPATH '/tmp/hive/exployee.txt'
 OVERWRITE INTO TABLE employee_internal;

  示例:创建外部表

> CREATE EXTERNAL TABLE employee_external ( -- 使用了EXTERNAL关键字
name string,
work_place ARRAY<string>,
gender_age STRUCT<gender:string,age:int>,
skills_score MAP<string,int>,
depart_title MAP<STRING,ARRAY<STRING>>
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
COLLECTION ITEMS TERMINATED BY ','
MAP KEYS TERMINATED BY ':'
STORED as TEXTFILE
LOCATION '/tmp/employee'; -- 指定数据目录路径

  加载数据

> LOAD DATA INPATH '/tmp/employee.txt'
 OVERWRITE INTO TABLE employee_external;

  示例:创建临时表

> CREATE TEMPORARY TABLE IF NOT EXISTS tmp_emp1 (   -- 使用了TEMPORARY关键字
name string,
work_place ARRAY<string>,
gender_age STRUCT<gender:string,age:int>,
skills_score MAP<string,int>,
depart_title MAP<STRING,ARRAY<STRING>>
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
COLLECTION ITEMS TERMINATED BY ','
MAP KEYS TERMINATED BY ':'
STORED as TEXTFILE;

  加载数据

> LOAD DATA INPATH '/tmp/employee.txt'
 OVERWRITE INTO TABLE tmp_emp1;

使用 CTAS 建表

  除了上面的建表语句外,hive还可以使用CTAS(Create-Table-As-Select)的方式建表。来查询结果生成之前,使用CTAS创建的表对其他用户是不可见的。使用 CTAS 创建表有一些限制:

  • 创建的表不能是分区表
  • 创建的表不能是外部表
  • 创建的表不能是分桶表

  CTAS 建表通常会触发一个yarn作业,尽管 SELECT * 语句本身不会触发任何的yarn作业。

  示例:CTAS 建表

> CREATE TEMPORARY TABLE tmp_emp2 as SELECT * FROM tmp_emp1;

  上面创建的表tmp_emp2,创建完成后会有数据。如果想要创建一张空表呢,可以使用如下的语句

> CREATE TEMPORARY TABLE tmp_emp3 as SELECT * FROM tmp_emp1 where 1=2;  --where条件为false

  CTAS还可以和CTE(Common Table Expression)一起使用。CTE是一个WITH子句中指定的简单SELECT查询派生的临时结果集,后跟用于生成结果集的SELECT或INSERT语句。CTE仅在单个语句的执行范围内定义。一个或多个CTE可以与关键字(如SELECT、INSERT、CREATE TABLE AS SELECT或CREATE VIEW AS SELECT语句)一起嵌套或链式使用。与编写复杂的嵌套查询相比,使用HQL的CTE会使查询更加简洁明了。

  示例

> CREATE TABLE cte_employee as
 WITH r1 as (
 SELECT name FROM r2 WHERE name = 'Michael'
 ),
 r2 as (
 SELECT name FROM employee_external WHERE gender_age.gender= 'Male'
 ),
 r3 as (
 SELECT name FROM employee_external WHERE gender_age.gender= 'Female'
 )
 SELECT * FROM r1
 UNION ALL
 SELECT * FROM r3;

> select * from cte_employee;
+--------------------+
| cte_employee.name  |
+--------------------+
| Michael            |
| Shelley            |
| Lucy               |
+--------------------+

使用 LIKE 子句建表

  另外,还可以使用 LIKE子句来创建表,这种方式使用已有的表结构来创建一个新表

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
  LIKE existing_table_or_view_name
  [LOCATION hdfs_path];

  示例

> CREATE TEMPORARY TABLE tmp_emp4 like tmp_emp1;

查看表

使用SHOW查看表

查看数据库中的表

  查看当前数据库或者IN子句指定数据库中的表

SHOW TABLES [IN database_name] ['identifier_with_wildcards'];

  示例

> show tables;
> show tables in default;
> show tables like "tmp*";
> show tables like "tmp*|emp*";

查看表的详细信息

SHOW TABLE EXTENDED [IN|FROM database_name] LIKE 'identifier_with_wildcards' [PARTITION(partition_spec)];

  示例

> SHOW TABLE EXTENDED LIKE tmp_emp1;
+----------------------------------------------------+
|                      tab_name                      |
+----------------------------------------------------+
| tableName:tmp_emp1                                 |
| owner:hadoop                                       |
| location:hdfs://ns001/user/hive/tmp/hadoop/33874d02-b862-4099-83c0-ec1c2b19479f/_tmp_space.db/a64106b0-2d37-4705-804c-3ea305f4dad2 |
| inputformat:org.apache.hadoop.mapred.TextInputFormat |
| outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
| columns:struct columns { string name, list<string> work_place, struct<gender:string,age:i32> gender_age, map<string,i32> skills_score, map<string,list<string>> depart_title} |
| partitioned:false                                  |
| partitionColumns:                                  |
| totalNumberFiles:1                                 |
| totalFileSize:228                                  |
| maxFileSize:228                                    |
| minFileSize:228                                    |
| lastAccessTime:1569397077750                       |
| lastUpdateTime:1569397092995                       |
|                                                    |
+----------------------------------------------------+

查看表的字段信息

SHOW COLUMNS (FROM|IN) table_name [(FROM|IN) db_name];
SHOW COLUMNS (FROM|IN) table_name [(FROM|IN) db_name]  [ LIKE 'pattern_with_wildcards']; --Added in Hive 3.0 by HIVE-18373.

  示例

> SHOW COLUMNS IN employee_internal;
+---------------+
|     field     |
+---------------+
| depart_title  |
| gender_age    |
| name          |
| skills_score  |
| work_place    |
+---------------+

查看建表语句

  示例

> SHOW CREATE TABLE employee_internal;
+----------------------------------------------------+
|                   createtab_stmt                   |
+----------------------------------------------------+
| CREATE TABLE `employee_internal`(                  |
|   `name` string COMMENT 'this is optinal column comments',  |
|   `work_place` array<string>,                      |
|   `gender_age` struct<gender:string,age:int>,      |
|   `skills_score` map<string,int>,                  |
|   `depart_title` map<string,array<string>>)        |
| COMMENT 'This is an internal table'                |
| ROW FORMAT SERDE                                   |
|   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  |
| WITH SERDEPROPERTIES (                             |
|   'collection.delim'=',',                          |
|   'field.delim'='|',                               |
|   'mapkey.delim'=':',                              |
|   'serialization.format'='|')                      |
| STORED AS INPUTFORMAT                              |
|   'org.apache.hadoop.mapred.TextInputFormat'       |
| OUTPUTFORMAT                                       |
|   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
| LOCATION                                           |
|   'hdfs://ns001/tmp/hive/employee_internal'       |
| TBLPROPERTIES (                                    |
|   'bucketing_version'='2',                         |
|   'transient_lastDdlTime'='1569395848')            |
+----------------------------------------------------+

查看表属性

SHOW TBLPROPERTIES tblname;
SHOW TBLPROPERTIES tblname("foo");

  示例

> SHOW TBLPROPERTIES employee_internal;
+------------------------+----------------------------+
|       prpt_name        |         prpt_value         |
+------------------------+----------------------------+
| bucketing_version      | 2                          |
| comment                | This is an internal table  |
| numFiles               | 1                          |
| numRows                | 0                          |
| rawDataSize            | 0                          |
| totalSize              | 228                        |
| transient_lastDdlTime  | 1569395848                 |
+------------------------+----------------------------+

使用DESCRIBE查看表信息

  在 Hive 1.x.x and 0.x.x 中描述table/view/column的语法有2种形式,主要区别是是否指定了数据库。
  如果没有指定数据库,则在点(.)后面提供可选的列信息。

DESCRIBE [EXTENDED|FORMATTED] 
  table_name[.col_name ( [.field_name] | [.'$elem$'] | [.'$key$'] | [.'$value$'] )* ];

  如果指定了数据库,则在空格后提供可选的列信息

DESCRIBE [EXTENDED|FORMATTED] 
  [db_name.]table_name[ col_name ( [.field_name] | [.'$elem$'] | [.'$key$'] | [.'$value$'] )* ];

  在hive2.0之后,描述table的命令语法发生了改变,语法向后不兼容。

-- 不支持字段和表名使用一个点(DOT)分隔这种格式
-- 分区说明信息是在表名之后、字段之前;之前的版本中,字段信息在表名和分区说明之间。
DESCRIBE [EXTENDED | FORMATTED]
    [db_name.]table_name [PARTITION partition_spec] [col_name ( [.field_name] | [.'$elem$'] | [.'$key$'] | [.'$value$'] )* ];

  DESCRIBE语句显示给定表包括分区字段在内的所有字段,如果使用了EXTENDED关键字,则以Thrift序列化形式显示表的元数据,这通常只对调试有用,不适用于一般用途。如果使用 FORMATTED 关键字,则以表格形式显示元数据。
  示例

> DESCRIBE employee_internal;
+---------------+--------------------------------+----------------------------------+
|   col_name    |           data_type            |             comment              |
+---------------+--------------------------------+----------------------------------+
| name          | string                         | this is optinal column comments  |
| work_place    | array<string>                  |                                  |
| gender_age    | struct<gender:string,age:int>  |                                  |
| skills_score  | map<string,int>                |                                  |
| depart_title  | map<string,array<string>>      |                                  |
+---------------+--------------------------------+----------------------------------+
> DESCRIBE EXTENDED employee_internal;
> DESCRIBE FORMATTED employee_internal;

  如果表有复合类型的字段,可以通过使用表名 复合字段名(field_name用于struct,’ e l e m elem ‘用于array,’ k e y key ‘用于map的key,’ v a l u e value '用于map的value)查看该列的属性。可以递归地指定这一语法,以剖析更复杂的字段类型。

  示例

> DESCRIBE employee_internal skills_score.$key$;
+-----------+------------+--------------------+
| col_name  | data_type  |      comment       |
+-----------+------------+--------------------+
| $key$     | string     | from deserializer  |
+-----------+------------+--------------------+

> DESCRIBE employee_internal skills_score.$value$;
+-----------+------------+--------------------+
| col_name  | data_type  |      comment       |
+-----------+------------+--------------------+
| $value$   | int        | from deserializer  |
+-----------+------------+--------------------+

> DESCRIBE employee_internal gender_age.field_name;
+-----------+------------+--------------------+
| col_name  | data_type  |      comment       |
+-----------+------------+--------------------+
| gender    | string     | from deserializer  |
+-----------+------------+--------------------+

> DESCRIBE employee_internal work_place.$elem$;
+-----------+------------+--------------------+
| col_name  | data_type  |      comment       |
+-----------+------------+--------------------+
| $elem$    | string     | from deserializer  |
+-----------+------------+--------------------+

清理表

  清理表有2中方式,一种是DROP TABLE,另一种是TRUNCATE TABLE

DROP TABLE

  对于内部表,DROP TABLE会把表的元数据信息和表数据都删除,删除的数据通常会被移动到当前用户家目录下的.trash中(如果配置了回收站)。对于外部表,DROP TABLE只会删除表的元数据,而保留表的数据。

DROP TABLE [IF EXISTS] table_name [PURGE];     -- (Note: PURGE available in Hive 0.14.0 and later)

TRUNCATE TABLE

  TRUNCATE TABLE 会把表的数据清空,但是会保留表。也就是将表变为一张空表。TRUNCATE TABLE只能用于内部表。

TRUNCATE TABLE table_name [PARTITION partition_spec];

  示例

> TRUNCATE TABLE cte_employee;
> SELECT name FROM cte_employee;
+--------------------+
| cte_employee.name |
+--------------------+
+--------------------+

修改表

  使用ALTER可以对表进行修改,比如修改表名,修改表属性,添加字段,修改字段数据类型等等。需要注意的是,ALTER TABLE修改的只是表的元数据,不会对表数据产生影响,用户需要手动确保实际数据符合元数据的定义,否则查询不会返回预期的任何结果。

修改表名

ALTER TABLE table_name RENAME TO new_table_name;

  示例

> ALTER TABLE cte_employee RENAME TO cte_employee_backup;

修改表属性

ALTER TABLE table_name SET TBLPROPERTIES table_properties;
 
table_properties:
  : (property_name = property_value, property_name = property_value, ... )

  示例:修改注释信息

> ALTER TABLE employee_internal SET TBLPROPERTIES ('comment' = 'Alter comment test');
> SHOW TBLPROPERTIES  employee_internal;
+------------------------+---------------------+
|       prpt_name        |     prpt_value      |
+------------------------+---------------------+
| bucketing_version      | 2                   |
| comment                | Alter comment test  |
| last_modified_by       | hadoop              |
| last_modified_time     | 1569463293          |
| numFiles               | 1                   |
| numRows                | 0                   |
| rawDataSize            | 0                   |
| totalSize              | 228                 |
| transient_lastDdlTime  | 1569463293          |
+------------------------+---------------------+

修改表的SerDe属性

ALTER TABLE table_name [PARTITION partition_spec] SET SERDE serde_class_name [WITH SERDEPROPERTIES serde_properties];
ALTER TABLE table_name [PARTITION partition_spec] SET SERDEPROPERTIES serde_properties;
 
serde_properties:
  : (property_name = property_value, property_name = property_value, ... )

  上述语句可以修改表的SerDe或给表的SerDe对象添加用户自定义的元数据。property_name和property_value都必须加引号。
  示例: 修改表的字段分隔符(省略了不相关的输出信息)

> create table employee_external_copy1 like employee_external;
> show create table employee_external_copy1;
+----------------------------------------------------+
|                   createtab_stmt                   |
+----------------------------------------------------+
...
| WITH SERDEPROPERTIES (                             |
|   'collection.delim'=',',                          |
|   'field.delim'='|',                               |
|   'mapkey.delim'=':',                              |
|   'serialization.format'='|')                      |
...
+----------------------------------------------------+

> ALTER TABLE employee_external_copy1 SET SERDEPROPERTIES ('field.delim'='&');
> show create table employee_external_copy1;
+----------------------------------------------------+
|                   createtab_stmt                   |
+----------------------------------------------------+
...
| WITH SERDEPROPERTIES (                             |
|   'collection.delim'=',',                          |
|   'field.delim'='&',                               |
|   'mapkey.delim'=':',                              |
|   'serialization.format'='|')                      |
...
+----------------------------------------------------+

修改表的文件格式

ALTER TABLE table_name [PARTITION partition_spec] SET FILEFORMAT file_format;

  示例

> show create table employee_external_copy1;
+----------------------------------------------------+
|                   createtab_stmt                   |
+----------------------------------------------------+
...
| STORED AS INPUTFORMAT                              |
|   'org.apache.hadoop.mapred.TextInputFormat'       |
| OUTPUTFORMAT                                       |
|   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
...
+----------------------------------------------------+
> ALTER TABLE employee_external_copy1 SET FILEFORMAT RCFILE;
> show create table employee_external_copy1;
+----------------------------------------------------+
|                   createtab_stmt                   |
+----------------------------------------------------+
...
| STORED AS INPUTFORMAT                              |
|   'org.apache.hadoop.hive.ql.io.RCFileInputFormat'  |
| OUTPUTFORMAT                                       |
|   'org.apache.hadoop.hive.ql.io.RCFileOutputFormat' |
...
+----------------------------------------------------+

修改表的存储位置

ALTER TABLE table_name [PARTITION partition_spec] SET LOCATION "new location";

注意:修改表的存储位置并不会对已有的数据文件进行移动,用户需要自己确保数据文件符合表的位置。

  示例

> show create table employee_external_copy1;
+----------------------------------------------------+
|                   createtab_stmt                   |
+----------------------------------------------------+
...
| LOCATION                                           |
|   'hdfs://ns001/tmp/hive/employee_external_copy1' |
...
> ALTER TABLE employee_external_copy1 SET LOCATION '/tmp/employee_external_copy1';
> show create table employee_external_copy1;
+----------------------------------------------------+
|                   createtab_stmt                   |
+----------------------------------------------------+
...

| LOCATION                                           |
|   'hdfs://ns001/tmp/employee_external_copy1'                              |
...
+----------------------------------------------------+

合并表中的小文件

  在RCFile 或者 ORC存储格式的表中,如果有很多的小文件,可以使用 CONCATENATE 选项对小文件进行合并。从Hive 0.8.0开始,RCfile被添加到支持使用 concatenate 选项快速在块级合并小RCile。从v0.14.0开始,添加了ORC文件支持使用concatenate选项对小型ORC文件进行快速条带级合并。目前还不支持其他的文件格式。RCfile在块级合并,ORC文件在条带级合并,可以避免数据解压缩和解码的开销。合并小文件会触发一个yarn作业。

ALTER TABLE table_name [PARTITION (partition_key = 'partition_value' [, ...])] CONCATENATE;

  示例

> ALTER TABLE employee_external_copy1 CONCATENATE;

修改字段的名称、类型、位置、注释

ALTER TABLE table_name [PARTITION partition_spec] CHANGE [COLUMN] col_old_name col_new_name column_type
  [COMMENT col_comment] [FIRST|AFTER column_name] [CASCADE|RESTRICT];

  CASCADE | RESTRICT子句在Hive1.1.0中可用。ALTER TABLE CHANGE COLUMN使用CASCADE更改表元数据的列信息,同时对所有分区元数据进行相同的更改。RESTRICT是默认选项,仅修改表中的字段信息。

  示例1: 修改字段名称,将字段那么改为employee_name

> desc employee_external_copy1
+---------------+--------------------------------+----------+
|   col_name    |           data_type            | comment  |
+---------------+--------------------------------+----------+
| name          | string                         |          |
| work_place    | array<string>                  |          |
| gender_age    | struct<gender:string,age:int>  |          |
| skills_score  | map<string,int>                |          |
| depart_title  | map<string,array<string>>      |          |
+---------------+--------------------------------+----------+
> ALTER TABLE employee_external_copy1  CHANGE COLUMN name employee_name string;
> desc employee_external_copy1
+----------------+--------------------------------+----------+
|    col_name    |           data_type            | comment  |
+----------------+--------------------------------+----------+
| employee_name  | string                         |          |
| work_place     | array<string>                  |          |
| gender_age     | struct<gender:string,age:int>  |          |
| skills_score   | map<string,int>                |          |
| depart_title   | map<string,array<string>>      |          |
+----------------+--------------------------------+----------+

  示例2:将字段名改回为原来的名称,并修改注释信息

> ALTER TABLE employee_external_copy1  CHANGE COLUMN employee_name name string COMMENT 'employ name';
>  desc employee_external_copy1
+---------------+--------------------------------+--------------+
|   col_name    |           data_type            |   comment    |
+---------------+--------------------------------+--------------+
| name          | string                         | employ name  |
| work_place    | array<string>                  |              |
| gender_age    | struct<gender:string,age:int>  |              |
| skills_score  | map<string,int>                |              |
| depart_title  | map<string,array<string>>      |              |
+---------------+--------------------------------+--------------+

  示例3:修改字段位置(出现了由于字段类型不兼容导致的报错信息)

> ALTER TABLE employee_external_copy1  CHANGE COLUMN name employee_name string after  work_place;
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. The following columns have types incompatible with the existing columns in their respective positions :
work_place,employee_name (state=08S01,code=1)

添加/替换字段

ALTER TABLE table_name 
  [PARTITION partition_spec]                 -- (Note: Hive 0.14.0 and later)
  ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...)
  [CASCADE|RESTRICT]                         -- (Note: Hive 1.1.0 and later)

ADD COLUMNS 可在现有字段之后、分区字段之前添加新的字段;
REPLACE COLUMNS 会删除所有现有字段,并新增一组字段。字段替换只支持使用了本地 SerDe(DynamicSerDe,MetadataTypedColumnsetSerDe,LazySimpleSerDe和ColumnarSerDe)的表。REPLACE COLUMNS还可以用来删除列。
CASCADE | RESTRICT 子句在Hive1.1.0中可用。ALTER TABLE ADD|REPLACE COLUMNS使用 CASCADE更改表元数据的列信息,同时对所有分区元数据进行相同的更改。RESTRICT是默认选项,仅修改表中的字段信息。

  示例1:为表添加2个字段

> ALTER TABLE employee_external_copy1 ADD  COLUMNS (c1 string COMMENT 'test1',c2 int COMMENT 'test2');
> DESC employee_external_copy1
+---------------+--------------------------------+--------------+
|   col_name    |           data_type            |   comment    |
+---------------+--------------------------------+--------------+
| name          | string                         | employ name  |
| work_place    | array<string>                  |              |
| gender_age    | struct<gender:string,age:int>  |              |
| skills_score  | map<string,int>                |              |
| depart_title  | map<string,array<string>>      |              |
| c1            | string                         | test1        |
| c2            | int                            | test2        |
+---------------+--------------------------------+--------------+

  示例2:替换(删除)表的字段

> create table  employee_external_copy2 like employee_external;
> DESC employee_external_copy2
+---------------+--------------------------------+----------+
|   col_name    |           data_type            | comment  |
+---------------+--------------------------------+----------+
| name          | string                         |          |
| work_place    | array<string>                  |          |
| gender_age    | struct<gender:string,age:int>  |          |
| skills_score  | map<string,int>                |          |
| depart_title  | map<string,array<string>>      |          |
+---------------+--------------------------------+----------+
> ALTER TABLE employee_external_copy2 REPLACE  COLUMNS (name string);
> DESC employee_external_copy2;
+-----------+------------+----------+
| col_name  | data_type  | comment  |
+-----------+------------+----------+
| name      | string     |          |
+-----------+------------+----------+

注意:无论是修改字段的属性(类型,位置等)还是添加或替换字段,这些操作只是修改hive的元数据,而不会修改表的实际数据。因此用户自己需要确保实际数据布局与元数据的定义是相符的。

参考:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
书籍 Apache Hive Essentials Second Edition (by Dayong Du) Chapter 3

发布了57 篇原创文章 · 获赞 3 · 访问量 1万+

猜你喜欢

转载自blog.csdn.net/CPP_MAYIBO/article/details/102055562