Hive官方使用手册——DDL使用说明

本文为自己翻译的译文,原文地址:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL


Hive Data Definition Language

概述

这里是HiveQL DDL语法说明文档 包括:

  • CREATE DATABASE/SCHEMA, TABLE, VIEW, FUNCTION, INDEX
  • DROP DATABASE/SCHEMA, TABLE, VIEW, INDEX
  • TRUNCATE TABLE
  • ALTER DATABASE/SCHEMA, TABLE, VIEW
  • MSCK REPAIR TABLE (or ALTER TABLE RECOVER PARTITIONS)
  • SHOW DATABASES/SCHEMAS, TABLES, TBLPROPERTIES, VIEWS, PARTITIONS, FUNCTIONS, INDEX[ES], COLUMNS, CREATE TABLE
  • DESCRIBE DATABASE/SCHEMA, table_name, view_name

PARTITION 语句通常是TABLE语句的选项, SHOW PARTITIONS除外。

Keywords, Non-reserved Keywords and Reserved Keywords

 

所有关键字

版本号

非保留关键字

保留的关键字

Hive 1.2.0

ADD, ADMIN, AFTER, ANALYZE, ARCHIVE, ASC, BEFORE, BUCKET, BUCKETS, CASCADE, CHANGE, CLUSTER, CLUSTERED, CLUSTERSTATUS, COLLECTION, COLUMNS, COMMENT, COMPACT, COMPACTIONS, COMPUTE, CONCATENATE, CONTINUE, DATA, DATABASES, DATETIME, DAY, DBPROPERTIES, DEFERRED, DEFINED, DELIMITED, DEPENDENCY, DESC, DIRECTORIES, DIRECTORY, DISABLE, DISTRIBUTE, ELEM_TYPE, ENABLE, ESCAPED, EXCLUSIVE, EXPLAIN, EXPORT, FIELDS, FILE, FILEFORMAT, FIRST, FORMAT, FORMATTED, FUNCTIONS, HOLD_DDLTIME, HOUR, IDXPROPERTIES, IGNORE, INDEX, INDEXES, INPATH, INPUTDRIVER, INPUTFORMAT, ITEMS, JAR, KEYS, KEY_TYPE, LIMIT, LINES, LOAD, LOCATION, LOCK, LOCKS, LOGICAL, LONG, MAPJOIN, MATERIALIZED, METADATA, MINUS, MINUTE, MONTH, MSCK, NOSCAN, NO_DROP, OFFLINE, OPTION, OUTPUTDRIVER, OUTPUTFORMAT, OVERWRITE, OWNER, PARTITIONED, PARTITIONS, PLUS, PRETTY, PRINCIPALS, PROTECTION, PURGE, READ, READONLY, REBUILD, RECORDREADER, RECORDWRITER, REGEXP, RELOAD, RENAME, REPAIR, REPLACE, REPLICATION, RESTRICT, REWRITE, RLIKE, ROLE, ROLES, SCHEMA, SCHEMAS, SECOND, SEMI, SERDE, SERDEPROPERTIES, SERVER, SETS, SHARED, SHOW, SHOW_DATABASE, SKEWED, SORT, SORTED, SSL, STATISTICS, STORED, STREAMTABLE, STRING, STRUCT, TABLES, TBLPROPERTIES, TEMPORARY, TERMINATED, TINYINT, TOUCH, TRANSACTIONS, UNARCHIVE, UNDO, UNIONTYPE, UNLOCK, UNSET, UNSIGNED, URI, USE, UTC, UTCTIMESTAMP, VALUE_TYPE, VIEW, WHILE, YEAR

ALL, ALTER, AND, ARRAY, AS, AUTHORIZATION, BETWEEN, BIGINT, BINARY, BOOLEAN, BOTH, BY, CASE, CAST, CHAR, COLUMN, CONF, CREATE, CROSS, CUBE, CURRENT, CURRENT_DATE, CURRENT_TIMESTAMP, CURSOR, DATABASE, DATE, DECIMAL, DELETE, DESCRIBE, DISTINCT, DOUBLE, DROP, ELSE, END, EXCHANGE, EXISTS, EXTENDED, EXTERNAL, FALSE, FETCH, FLOAT, FOLLOWING, FOR, FROM, FULL, FUNCTION, GRANT, GROUP, GROUPING, HAVING, IF, IMPORT, IN, INNER, INSERT, INT, INTERSECT, INTERVAL, INTO, IS, JOIN, LATERAL, LEFT, LESS, LIKE, LOCAL, MACRO, MAP, MORE, NONE, NOT, NULL, OF, ON, OR, ORDER, OUT, OUTER, OVER, PARTIALSCAN, PARTITION, PERCENT, PRECEDING, PRESERVE, PROCEDURE, RANGE, READS, REDUCE, REVOKE, RIGHT, ROLLUP, ROW, ROWS, SELECT, SET, SMALLINT, TABLE, TABLESAMPLE, THEN, TIMESTAMP, TO, TRANSFORM, TRIGGER, TRUE, TRUNCATE, UNBOUNDED, UNION, UNIQUEJOIN, UPDATE, USER, USING, UTC_TMESTAMP, VALUES, VARCHAR, WHEN, WHERE, WINDOW, WITH

Hive 2.0.0

removed: REGEXP, RLIKE

added: AUTOCOMMIT, ISOLATION, LEVEL, OFFSET, SNAPSHOT, TRANSACTION, WORK, WRITE

added: COMMIT, ONLY, REGEXP, RLIKE, ROLLBACK, START

Hive 2.1.0

added: ABORT, KEY, LAST, NORELY, NOVALIDATE, NULLS, RELY, VALIDATE

added: CACHE, CONSTRAINT, FOREIGN, PRIMARY, REFERENCES

Hive 2.2.0

added: DETAIL, DOW, EXPRESSION, OPERATOR, QUARTER, SUMMARY, VECTORIZATION, WEEK, YEARS, MONTHS, WEEKS, DAYS, HOURS, MINUTES, SECONDS

added: DAYOFWEEK, EXTRACT, FLOOR, INTEGER, PRECISION, VIEWS

Hive 3.0.0

added: TIMESTAMPTZ, ZONE  added: TIME, NUMERIC

版本信息

REGEXP and RLIKE are non-reserved keywords prior to Hive 2.0.0 and reserved keywords starting in Hive 2.0.0 (HIVE-11703).

如果您像支持引用标识符的列名中所描述的那样引用它们,那么保留的关键字允许作为标识符(0.13.0 及之后的版本,请查阅 HIVE-6013).。 大多数关键字是通过HIVE-6617保留的,以减少语法上的歧义(version 1.2.0 and later)。仍然有两种方法供用户想使用这些保留关键字标识符:(1)使用引用标识符;(2)设置hive.support.sql11.reserved.keywords = false。(version 2.1.0 and earlier) 

Create/Drop/Alter/Use Database

Create Database

CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name
   [COMMENT database_comment]
   [LOCATION hdfs_path]
   [WITH DBPROPERTIES (property_name=property_value, ...)];

SCHEMA DATABASE 的使用是可互换的——它们的意思是一样的。CREATE DATABASE 是在Hive 0.6中增加的(HIVE-675).  WITH DBPROPERTIES 子句是在Hive 0.7中增加的 (HIVE-1836).

Drop Database

DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];

SCHEMA DATABASE 的使用是可互换的——它们的意思是一样的。 DROP DATABASE 是在Hive 0.6中增加的 (HIVE-675)。默认行为是限制的,如果数据库不是空的,那么DROP数据库将会失败。 To drop the tables in the database as well, use DROP DATABASE ... CASCADE(级联). 若要删除数据库中的表,请使用 DROP DATABASE ... CASCADE。在Hive 0.8中添加了对RESTRICT和CASCADE的支持。(HIVE-2090).

Alter Database

ALTER (DATABASE|SCHEMA) database_name SET DBPROPERTIES (property_name=property_value, ...);
-- (Note: SCHEMA added in Hive  0.14 . 0 )
 
ALTER (DATABASE|SCHEMA) database_name SET OWNER [USER|ROLE] user_or_role;
-- (Note: Hive  0.13 . 0  and later; SCHEMA added in Hive  0.14 . 0 )
 
ALTER (DATABASE|SCHEMA) database_name SET LOCATION hdfs_path;
-- (Note: Hive  2.2 . 1 2.4 . 0   and later)

SCHEMA DATABASE 的使用是可互换的——它们的意思是一样的。ALTER SCHEMA是在Hive 0.6中增加的(HIVE-6601).

ALTER DATABASE ... SET LOCATION语句不会移动数据库的内容到指定的新目录中。它不改变指定数据库中任何表、分区的相关路径。它只改变默认的父目录,只会更改数据库中新增加的表。这个操作类似于改变表的位置但是不移动现存的分区位置。

数据库的其它元数据都不可以改变。

Use Database

USE database_name;
USE DEFAULT;

USE 为所有后续的HiveQL语句设置当前数据库。要还原到默认数据库,使用关键字“default”而不是数据库名。检查当前正在使用哪个数据库: SELECT current_database() (as of Hive 0.13.0).

USE database_name 是在 Hive 0.6 (HIVE-675)中增加的。

Create/Drop/Truncate Table

Create Table

-- (Note: TEMPORARY available in Hive 0.14.0 and later)
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name    
   [(col_name data_type [COMMENT col_comment], ... [constraint_specification])]
   [COMMENT table_comment]
   [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
   [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
   [SKEWED BY (col_name, col_name, ...)                  -- (Note: Available in Hive  0.10 . 0  and later)]
      ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
      [STORED AS DIRECTORIES]
   [
    [ROW FORMAT row_format] 
    [STORED AS file_format]
      | STORED BY  'storage.handler.class.name'  [WITH SERDEPROPERTIES (...)]  
    -- (Note: Available in Hive  0.6 . 0  and later)
   ]
   [LOCATION hdfs_path]
   [TBLPROPERTIES (property_name=property_value, ...)]   -- (Note: Available in Hive  0.6 . 0  and later)
   [AS select_statement];   -- (Note: Available in Hive  0.5 . 0  and later; not supported  for  external tables)
 
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
   LIKE existing_table_or_view_name
   [LOCATION hdfs_path];
 
data_type
   : primitive_type
   | array_type
   | map_type
   | struct_type
   | union_type  -- (Note: Available in Hive  0.7 . 0  and later)
 
primitive_type
   : TINYINT
   | SMALLINT
   | INT
   | BIGINT
   | BOOLEAN
   | FLOAT
   | DOUBLE
   | DOUBLE PRECISION -- (Note: Available in Hive  2.2 . 0  and later)
   | STRING
   | BINARY      -- (Note: Available in Hive  0.8 . 0  and later)
   | TIMESTAMP   -- (Note: Available in Hive  0.8 . 0  and later)
   | DECIMAL     -- (Note: Available in Hive  0.11 . 0  and later)
   | DECIMAL(precision, scale)  -- (Note: Available in Hive  0.13 . 0  and later)
   | DATE        -- (Note: Available in Hive  0.12 . 0  and later)
   | VARCHAR     -- (Note: Available in Hive  0.12 . 0  and later)
   | CHAR        -- (Note: Available in Hive  0.13 . 0  and later)
 
array_type
   : ARRAY < data_type >
 
map_type
   : MAP < primitive_type, data_type >
 
struct_type
   : STRUCT < col_name : data_type [COMMENT col_comment], ...>
 
union_type
    : UNIONTYPE < data_type, data_type, ... >  -- (Note: Available in Hive  0.7 . 0  and later)
 
row_format
   : DELIMITED [FIELDS TERMINATED BY  char  [ESCAPED BY  char ]] [COLLECTION ITEMS TERMINATED BY  char ]
         [MAP KEYS TERMINATED BY  char ] [LINES TERMINATED BY  char ]
         [NULL DEFINED AS  char ]   -- (Note: Available in Hive  0.13  and later)
   | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]
 
file_format:
   : SEQUENCEFILE
   | TEXTFILE    -- (Default, depending on hive. default .fileformat configuration)
   | RCFILE      -- (Note: Available in Hive  0.6 . 0  and later)
   | ORC         -- (Note: Available in Hive  0.11 . 0  and later)
   | PARQUET     -- (Note: Available in Hive  0.13 . 0  and later)
   | AVRO        -- (Note: Available in Hive  0.14 . 0  and later)
   | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
 
constraint_specification:
   : [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE ]
     [, CONSTRAINT constraint_name FOREIGN KEY (col_name, ...) REFERENCES table_name(col_name, ...) DISABLE NOVALIDATE 

CREATE TABLE 创建一个指定名字的表格。如果具有相同名称的表或视图已经存在,则会抛出错误。您可以增加IF NOT EXISTS来避免报错。

  • 表名和列名是不区分大小写的,但是SerDe和属性名是区分大小写的。
    • 在Hive 0.12及更早的版本中只有字母数字和下划线字符可以在表和列名中使用。
    • 在Hive 0.13及之后的版本中,列名可以包含任何Unicode字符(参见HIVE-6013),但是点(.和冒号(:)的使用会在查询时报错,所以他们在Hive 1.2.0中是不被允许使用的。重音符(`)指定的列名都会被按字面意思处理。使用双重音符(``来表示一个重音符。重音符的使用还允许对表和列标识符使用保留的关键字。
    • 要恢复到0.13.0之前的规则,并将列名限制为字母数字和下划线字符,设置配置属性 hive.support.quoted.identifiers 为none。在这个配置中重音符会被像普通字符一样对待。详情请查阅 Supporting Quoted Identifiers in Column Names.
  • 表格和列注释是字符串常量(单引号)。
  • 没有EXTERNAL 子句创建的表被称为托管表,因为Hive管理它的数据。为了查明表是否被管理或外部,请在DESCRIBE EXTENDED table_name的输出中查看tableType。
  • TBLPROPERTIES子句允许您用自己的元数据键/值对标记表定义。还存在一些预定义的表属性,比如last_modified_userlast_modified_time ,它们是由Hive自动添加和管理的。其他预定义的表属性包括:
    • TBLPROPERTIES ("comment"="table_comment")
    • TBLPROPERTIES ("hbase.table.name"="table_name") – see HBase Integration.
    • TBLPROPERTIES ("immutable"="true") or ("immutable"="false") in release 0.13.0+ (HIVE-6406) – see Inserting Data into Hive Tables from Queries.
    • TBLPROPERTIES ("orc.compress"="ZLIB") or ("orc.compress"="SNAPPY") or ("orc.compress"="NONE") and other ORC properties – see ORC Files.
    • TBLPROPERTIES ("transactional"="true") or ("transactional"="false") in release 0.14.0+, the default is "false" – see Hive Transactions.
    • TBLPROPERTIES ("NO_AUTO_COMPACTION"="true") or ("NO_AUTO_COMPACTION"="false"), the default is "false" – see Hive Transactions.
    • TBLPROPERTIES ("compactor.mapreduce.map.memory.mb"="mapper_memory") – see Hive Transactions.
    • TBLPROPERTIES ("compactorthreshold.hive.compactor.delta.num.threshold"="threshold_num") – see Hive Transactions.
    • TBLPROPERTIES ("compactorthreshold.hive.compactor.delta.pct.threshold"="threshold_pct") – see Hive Transactions.
    • TBLPROPERTIES ("auto.purge"="true") or ("auto.purge"="false") in release 1.2.0+ (HIVE-9118) – see Drop TableDrop PartitionsTruncate Table, and Insert Overwrite.
    • TBLPROPERTIES ("EXTERNAL"="TRUE") in release 0.6.0+ (HIVE-1329) – Change a managed table to an external table and vice versa for "FALSE".
      • As of Hive 2.4.0 (HIVE-16324) the value of the property 'EXTERNAL' is parsed as a boolean (case insensitive true or false) instead of a case sensitive string comparison.
  • 要为某表指定一个数据库,要么在CREATE TABLE语句(Hive.0.6及更高版本)之前使用USE database_name语句,要么用数据库名称(“databasename.TABLE.name”在Hive 0.7及更高版本)来指定表名。
    关键字 "default"被用于默认的数据库。 

请参阅下面的Alter Table,以获得关于表注释、表属性和SerDe属性的更多信息。

有关原始和复杂数据类型的详细信息,请参阅类型系统和Hive数据类型。

Managed and External Tables管理和外部表

默认情况下,Hive会创建管理表,其中文件、元数据和统计信息是由内部的Hive进程管理的。管理表存储在hive.metastore.warehouse.dir的路径属性下,默认在一个类似于/app/hi/warehouse/databasename.db/tablename/的文件夹路径中。默认位置可以在表创建期间被位置属性覆盖。如果删除了管理表或分区,则删除与该表或分区相关联的数据和元数据。如果没有指定PURGE 选项,则数据会被移动到一个垃圾文件夹中,以确定持续时间。

当需要用Hive管理表的生命周期时使用管理表,否则使用临时表。.

外部表描述外部文件的元数据/模式。 外部表文件可以由Hive之外的进程访问和管理。外部表可以访问储存在诸如Azure存储卷(ASV)或远程HDFS位置的数据。如果外部表的结构或分区发生了变化,那么可以使用MSCK REPAIR TABLE table_name语句来刷新元数据信息。

当文件已经存在或在远程位置时,使用外部表,即使表被删除,文件也会保留。

管理的或外部的表可以使用 DESCRIBE FORMATTED table_name命令来识别,它将根据表类型显示 是MANAGED_TABLEEXTERNAL_TABLE

统计信息可以在内部和外部的表和分区上进行管理,以进行查询优化。

Storage Formats

Hive支持内置的和定制的文件格式。有关压缩表存储的详细信息,请参阅压缩存储
下面是一些内置到Hive的格式:
 

Storage Format
Description
STORED AS TEXTFILE Stored as plain text files. TEXTFILE is the default file format, unless the configuration parameter hive.default.fileformat has a different setting.

Use the DELIMITED clause to read delimited files.

Enable escaping for the delimiter characters by using the 'ESCAPED BY' clause (such as ESCAPED BY '\') 
Escaping is needed if you want to work with data that can contain these delimiter characters. 

A custom NULL format can also be specified using the 'NULL DEFINED AS' clause (default is '\N').

STORED AS SEQUENCEFILE Stored as compressed Sequence File.
STORED AS ORC Stored as ORC file format. Supports ACID Transactions & Cost-based Optimizer (CBO). Stores column-level metadata.
STORED AS PARQUET Stored as Parquet format for the Parquet columnar storage format in Hive 0.13.0 and later
Use ROW FORMAT SERDE ... STORED AS INPUTFORMAT ... OUTPUTFORMAT syntax ... in Hive 0.10, 0.11, or 0.12.
STORED AS AVRO Stored as Avro format in Hive 0.14.0 and later (see Avro SerDe).
STORED AS RCFILE Stored as Record Columnar File format.
STORED BY Stored by a non-native table format. To create or link to a non-native table, for example a table backed by HBase or Druid or Accumulo
See StorageHandlers for more information on this option.
INPUTFORMAT and OUTPUTFORMAT in the file_format to specify the name of a corresponding InputFormat and OutputFormat class as a string literal.

For example, 'org.apache.hadoop.hive.contrib.fileformat.base64.Base64TextInputFormat'. 

For LZO compression, the values to use are 
'INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat" 
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"' 

(see LZO Compression).


Row Formats & SerDe

您可以使用定制的SerDe或使用本地SerDe来创建表。 如果没有指定ROW FORMAT或者ROW FORMAT DELIMITED,则使用本地SerDe。
使用SerDe子句来创建一个带有定制SERDE的表。有关SerDes的更多信息:

您必须为使用SerDe的表格定义好列的属性。请参考用户指南的数据类型部分中允许的列类型的。

虽然可以指定使用自定义的SerDe的表的列列表但是Hive会查询SerDe以决定该表的实际列列表。

有关SerDes的一般信息,请参阅开发人员指南中的Hive SerDe。请参阅SerDe了解关于输入和输出处理的详细信息。

要更改表格的SerDe或SERDEPROPERTIES,请使用下面描述的ALTER TABLE语句,添加SerDe属性。

Row Format

Description

RegEx

ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES 
(
"input.regex" = "<regex>"
)
STORED AS TEXTFILE;

Stored as plain text file, translated by Regular Expression.

The following example defines a table in the default Apache Weblog format.

 

CREATE  TABLE  apachelog (
   host STRING,
   identity STRING,
   user  STRING,
   time  STRING,
   request STRING,
   status STRING,
   size  STRING,
   referer STRING,
   agent STRING)
ROW FORMAT SERDE  'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH  SERDEPROPERTIES (
   "input.regex"  "([^]*) ([^]*) ([^]*) (-|\\[^\\]*\\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\".*\") ([^ \"]*|\".*\"))?"
)
STORED  AS  TEXTFILE;

More about RegexSerDe can be found here in HIVE-662 and HIVE-1719.

JSON 

ROW FORMAT SERDE 
'org.apache.hive.hcatalog.data.JsonSerDe' 
STORED AS TEXTFILE
Stored as plain text file in JSON format.

The JsonSerDe for JSON files is available in  Hive 0.12 and later.

In some distributions, a reference to hive-hcatalog-core.jar is required.
ADD JAR /usr/lib/hive-hcatalog/lib/hive-hcatalog-core.jar;

CREATE
  TABLE  my_table(a string, b  bigint , ...)
ROW FORMAT SERDE  'org.apache.hive.hcatalog.data.JsonSerDe'
STORED  AS  TEXTFILE;

The JsonSerDe was moved to Hive from HCatalog and before it was in hive-contrib project. It was added to the Hive distribution by  HIVE-4895.
An Amazon SerDe is available at  s3://elasticmapreduce/samples/hive-ads/libs/jsonserde.jar for releases prior to 0.12.0.

The JsonSerDe for JSON files is available in  Hive 0.12 and later.

CSV/TSV

ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
STORED AS TEXTFILE
Stored as plain text file in CSV / TSV format.
 
The CSVSerde is available in  Hive 0.14 and greater.
The following example creates a TSV (Tab-separated) file.
 
CREATE TABLE my_table(a string, b string, ...)
ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH  SERDEPROPERTIES (
    "separatorChar"  "\t" ,
    "quoteChar"      "'" ,
    "escapeChar"     "\\"
)  
STORED  AS  TEXTFILE;
Default properties for SerDe is Comma-Separated (CSV) file
 
DEFAULT_ESCAPE_CHARACTER \
DEFAULT_QUOTE_CHARACTER  "
DEFAULT_SEPARATOR        ,

This SerDe works for most CSV data, but does not handle embedded newlines. To use the SerDe, specify the fully qualified class name org.apache.hadoop.hive.serde2.OpenCSVSerde.  

Documentation is based on original documentation at https://github.com/ogrodnek/csv-serde.

Limitations
This SerDe treats all columns to be of type String. Even if you create a table with non-string column types using this SerDe, the DESCRIBE TABLE output would show string column type. 
The type information is retrieved from the SerDe. 

To convert columns to the desired type in a table, you can create a view over the table that does the CAST to the desired type.

The CSV SerDe is based on  https://github.com/ogrodnek/csv-serde, and was added to the Hive distribution in  HIVE-7777.
The CSVSerde has been built and tested against Hive 0.14 and later, and uses  Open-CSV 2.3 which is bundled with the Hive distribution.

For general information about SerDes, see  Hive SerDe in the Developer Guide. Also see  SerDe for details about input and output processing.

Partitioned Tables

分区表可以使用PARTITIONED BY子句创建。 一个表可以有一个或多个分区列,并且为分区列中的每一个不同的值组合创建一个单独的数据目录。 一个表可以有一个或多个分区列,并且为分区列中的每一个不同的值组合创建一个单独的数据目录。而且,可以通过列对表或分区进行分类,并且可以通过列对数据进行排序。这可以提高某些类型查询的性能。

如果在创建分区表时,您会得到以下错误: "FAILED: Error in semantic analysis: Column repeated in partitioning columns," 这意味着您正在尝试将分区列包含在表格本身的数据中。你可能确实有定义了列。然而,您所创建的分区构成了一个可以查询的伪列,因此您必须将表列重命名为别的东西(用户不应该查询!)

例如,假设您原来的未分区表有三列:id、日期和名称。

Example:
id      int ,
date   date,
name   varchar

现在你想在日期上进行划分。你的Hive定义可以使用“dtDontQuery”作为列名,这样“date”就可以用于分区(和查询)。

Example:
create table table_name (
   id                 int ,
   dtDontQuery       string,
   name              string
)
partitioned by (date string)

现在您的用户仍然会使用“where date = '...'”查询,但是第二列dtDontQuery将保留原始值。

下面是创建分区表的示例语句:

Example:
CREATE TABLE page_view(viewTime INT, userid BIGINT,
      page_url STRING, referrer_url STRING,
      ip STRING COMMENT  'IP Address of the User' )
  COMMENT  'This is the page view table'
  PARTITIONED BY(dt STRING, country STRING)
  STORED AS SEQUENCEFILE;

上面的语句创建了带有viewTime、userid、pageurl、referrerurl和ip的列(包括注释)的pageview表。 该表也被分区,数据存储在sequence 文件中。文件中的数据格式被假定为由ctrl-A和行分隔的换行符分隔。

Example:
CREATE TABLE page_view(viewTime INT, userid BIGINT,
      page_url STRING, referrer_url STRING,
      ip STRING COMMENT  'IP Address of the User' )
  COMMENT  'This is the page view table'
  PARTITIONED BY(dt STRING, country STRING)
  ROW FORMAT DELIMITED
    FIELDS TERMINATED BY  '\001'
STORED AS SEQUENCEFILE;

上面的语句允许您创建与前一张表相同的表。

在前面的例子中,数据存储在<hive.metastore.warehouse.dir>/page_view中。在Hive config文件的hisite.xml中为hive.metastore.warehouse.dir指定一个值。

External Tables

关键字External允许您创建一个表并提供一个位置,以便Hive不使用该表的缺省位置。如果您已经生成了数据,那么这将非常方便。当删除一个外部表时,表中的数据不会从文件系统中删除。

外部表需要指定HDFS存储位置,而不是存储在配置属性hive.metastore.warehouse.dir指定的文件夹中。

Example:
CREATE EXTERNAL TABLE page_view(viewTime INT, userid BIGINT,
      page_url STRING, referrer_url STRING,
      ip STRING COMMENT  'IP Address of the User' ,
      country STRING COMMENT  'country of origination' )
  COMMENT  'This is the staging page view table'
  ROW FORMAT DELIMITED FIELDS TERMINATED BY  '\054'
  STORED AS TEXTFILE
  LOCATION  '<hdfs_location>' ;

您可以使用上面的语句来创建一个page_view表,它指向任何HDFS存储位置。但是您仍然需要确保数据是按照上面的CREATE语句中指定的。

另一个创建外部表的例子,请参阅本教程中的加载数据。

Create Table As Select (CTAS)

Tables can also be created and populated by the results of a query in one create-table-as-select (CTAS) statement. The table created by CTAS is atomic, meaning that the table is not seen by other users until all the query results are populated. So other users will either see the table with the complete results of the query or will not see the table at all.

表格也可以在一个create-table-as-select(CTAS)语句中的查询结果中创建和数据填充。CTAS创建的表格是原子的,这意味着在所有查询结果都被填充之前,其他用户不会看到该表。因此,其他用户要么会看到带有查询完整结果的表,要么根本就不会看到该表。

There are two parts in CTAS, the SELECT part can be any SELECT statement supported by HiveQL. The CREATE part of the CTAS takes the resulting schema from the SELECT part and creates the target table with other table properties such as the SerDe and storage format.

CTAS has these restrictions:

  • The target table cannot be a partitioned table.
  • The target table cannot be an external table.
  • The target table cannot be a list bucketing table.
Example:
CREATE TABLE new_key_value_store
    ROW FORMAT SERDE  "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"
    STORED AS RCFile
    AS
SELECT (key %  1024 ) new_key, concat(key, value) key_value_pair
FROM key_value_store
SORT BY new_key, key_value_pair;

The above CTAS statement creates the target table new_key_value_store with the schema (new_key DOUBLE, key_value_pair STRING) derived from the results of the SELECT statement. If the SELECT statement does not specify column aliases, the column names will be automatically assigned to _col0, _col1, and _col2 etc. In addition, the new target table is created using a specific SerDe and a storage format independent of the source tables in the SELECT statement.

Starting with Hive 0.13.0, the SELECT statement can include one or more common table expressions (CTEs), as shown in the SELECT syntax. For an example, see Common Table Expression.

Being able to select data from one table to another is one of the most powerful features of Hive. Hive handles the conversion of the data from the source format to the destination format as the query is being executed.

Create Table Like

The LIKE form of CREATE TABLE allows you to copy an existing table definition exactly (without copying its data). In contrast to CTAS, the statement below creates a new empty_key_value_store table whose definition exactly matches the existing key_value_store in all particulars other than table name. The new table contains no rows.

CREATE TABLE empty_key_value_store
LIKE key_value_store [TBLPROPERTIES (property_name=property_value, ...)];

Before Hive 0.8.0, CREATE TABLE LIKE view_name would make a copy of the view. In Hive 0.8.0 and later releases, CREATE TABLE LIKE view_name creates a table by adopting the schema of view_name (fields and partition columns) using defaults for SerDe and file formats.

Bucketed Sorted Tables

Example:
CREATE TABLE page_view(viewTime INT, userid BIGINT,
      page_url STRING, referrer_url STRING,
      ip STRING COMMENT  'IP Address of the User' )
  COMMENT  'This is the page view table'
  PARTITIONED BY(dt STRING, country STRING)
  CLUSTERED BY(userid) SORTED BY(viewTime) INTO  32  BUCKETS
  ROW FORMAT DELIMITED
    FIELDS TERMINATED BY  '\001'
    COLLECTION ITEMS TERMINATED BY  '\002'
    MAP KEYS TERMINATED BY  '\003'
  STORED AS SEQUENCEFILE;

In the example above, the page_view table is bucketed (clustered by) userid and within each bucket the data is sorted in increasing order of viewTime. Such an organization allows the user to do efficient sampling on the clustered column - in this case userid. The sorting property allows internal operators to take advantage of the better-known data structure while evaluating queries, also increasing efficiency. MAP KEYS and COLLECTION ITEMS keywords can be used if any of the columns are lists or maps.

The CLUSTERED BY and SORTED BY creation commands do not affect how data is inserted into a table – only how it is read. This means that users must be careful to insert data correctly by specifying the number of reducers to be equal to the number of buckets, and using CLUSTER BY and SORT BY commands in their query.

There is also an example of creating and populating bucketed tables.

Skewed Tables

Version information

As of Hive 0.10.0 (HIVE-3072 and HIVE-3649). See HIVE-3026 for additional JIRA tickets that implemented list bucketing in Hive 0.10.0 and 0.11.0.

Design documents

Read the Skewed Join Optimization and List Bucketing design documents for more information.

This feature can be used to improve performance for tables where one or more columns have skewed values. By specifying the values that appear very often (heavy skew) Hive will split those out into separate files (or directories in case of list bucketing) automatically and take this fact into account during queries so that it can skip or include the whole file (or directory in case of list bucketing) if possible.

This can be specified on a per-table level during table creation.

The following example shows one column with three skewed values, optionally with the STORED AS DIRECTORIES clause which specifies list bucketing.

Example:
CREATE TABLE list_bucket_single (key STRING, value STRING)
   SKEWED BY (key) ON ( 1 , 5 , 6 ) [STORED AS DIRECTORIES];

And here is an example of a table with two skewed columns.

Example:
CREATE TABLE list_bucket_multiple (col1 STRING, col2  int , col3 STRING)
   SKEWED BY (col1, col2) ON (( 's1' , 1 ), ( 's3' , 3 ), ( 's13' , 13 ), ( 's78' , 78 )) [STORED AS DIRECTORIES];

For corresponding ALTER TABLE statements, see Alter Table Skewed or Stored as Directories below.

Temporary Tables

Version information

As of Hive 0.14.0 (HIVE-7090).

A table that has been created as a temporary table will only be visible to the current session. Data will be stored in the user's scratch directory, and deleted at the end of the session.

If a temporary table is created with a database/table name of a permanent table which already exists in the database, then within that session any references to that table will resolve to the temporary table, rather than to the permanent table. The user will not be able to access the original table within that session without either dropping the temporary table, or renaming it to a non-conflicting name.

Temporary tables have the following limitations:

  • Partition columns are not supported.
  • No support for creation of indexes.

Starting in Hive 1.1.0 the storage policy for temporary tables can be set to memoryssd, or default with the hive.exec.temporary.table.storage configuration parameter (see HDFS Storage Types and Storage Policies).

Constraints

Version information

As of Hive 2.1.0 (HIVE-13290).

Hive includes support for non-validated primary and foreign key constraints. Some SQL tools generate more efficient queries when constraints are present. Since these constraints are not validated, an upstream system needs to ensure data integrity before it is loaded into Hive.

Example:
create table pk(id1 integer, id2 integer,
   primary key(id1, id2) disable novalidate);
 
create table fk(id1 integer, id2 integer,
   constraint c1 foreign key(id1, id2) references pk(id2, id1) disable novalidate);

Drop Table

DROP TABLE [IF EXISTS] table_name [PURGE];     -- (Note: PURGE available in Hive  0.14 . 0  and later)

DROP TABLE removes metadata and data for this table. The data is actually moved to the .Trash/Current directory if Trash is configured (and PURGE is not specified). The metadata is completely lost.

When dropping an EXTERNAL table, data in the table will NOT be deleted from the file system.

When dropping a table referenced by views, no warning is given (the views are left dangling as invalid and must be dropped or recreated by the user).

Otherwise, the table information is removed from the metastore and the raw data is removed as if by 'hadoop dfs -rm'. In many cases, this results in the table data being moved into the user's .Trash folder in their home directory; users who mistakenly DROP TABLEs may thus be able to recover their lost data by recreating a table with the same schema, recreating any necessary partitions, and then moving the data back into place manually using Hadoop. This solution is subject to change over time or across installations as it relies on the underlying implementation; users are strongly encouraged not to drop tables capriciously.

Version information: PURGE

The PURGE option is added in version 0.14.0 by HIVE-7100.

If PURGE is specified, the table data does not go to the .Trash/Current directory and so cannot be retrieved in the event of a mistaken DROP. The purge option can also be specified with the table property auto.purge (see TBLPROPERTIES above).

In Hive 0.7.0 or later, DROP returns an error if the table doesn't exist, unless IF EXISTS is specified or the configuration variable hive.exec.drop.ignorenonexistent is set to true.

See the Alter Partition section below for how to drop partitions.

Truncate Table

Version information

As of Hive 0.11.0 (HIVE-446).

TRUNCATE TABLE table_name [PARTITION partition_spec];
 
partition_spec:
   : (partition_column = partition_col_value, partition_column = partition_col_value, ...)

Removes all rows from a table or partition(s). The rows will be trashed if the filesystem Trash is enabled, otherwise they are deleted (as of Hive 2.2.0 with HIVE-14626). Currently the target table should be native/managed table or an exception will be thrown. User can specify partial partition_spec for truncating multiple partitions at once and omitting partition_spec will truncate all partitions in the table.

Starting with HIVE 2.3.0 (HIVE-15880) if the table property "auto.purge" (see TBLPROPERTIES above) is set to "true" the data of the table is not moved to Trash when a TRUNCATE TABLE command is issued against it and cannot be retrieved in the event of a mistaken TRUNCATE. This is applicable only for managed tables (see managed tables). This behavior can be turned off if the "auto.purge" property is unset or set to false for a managed table.

Alter Table/Partition/Column

Alter table statements enable you to change the structure of an existing table. You can add columns/partitions, change SerDe, add table and SerDe properties, or rename the table itself. Similarly, alter table partition statements allow you change the properties of a specific partition in the named table.

Alter Table

Rename Table

ALTER TABLE table_name RENAME TO new_table_name;

This statement lets you change the name of a table to a different name.

As of version 0.6, a rename on a managed table moves its HDFS location. Rename has been changed as of version 2.2.0 (HIVE-14909) so that a managed table's HDFS location is moved only if the table is created without a LOCATION clause and under its database directory. Hive versions prior to 0.6 just renamed the table in the metastore without moving the HDFS location.

Alter Table Properties

ALTER TABLE table_name SET TBLPROPERTIES table_properties;
 
table_properties:
   : (property_name = property_value, property_name = property_value, ... )

You can use this statement to add your own metadata to the tables. Currently last_modified_user, last_modified_time properties are automatically added and managed by Hive. Users can add their own properties to this list. You can do DESCRIBE EXTENDED TABLE to get this information.

For more information, see the TBLPROPERTIES clause in Create Table above.

Alter Table Comment

To change the comment of a table you have to change the comment property of the TBLPROPERTIES:

ALTER TABLE table_name SET TBLPROPERTIES ( 'comment'  = new_comment);

Add SerDe Properties

ALTER TABLE table_name [PARTITION partition_spec] SET SERDE serde_class_name [WITH SERDEPROPERTIES serde_properties];
 
ALTER TABLE table_name [PARTITION partition_spec] SET SERDEPROPERTIES serde_properties;
 
serde_properties:
   : (property_name = property_value, property_name = property_value, ... )

These statements enable you to change a table's SerDe or add user-defined metadata to the table's SerDe object.

The SerDe properties are passed to the table's SerDe when it is being initialized by Hive to serialize and deserialize data. So users can store any information required for their custom SerDe here. Refer to the SerDe documentation and Hive SerDe in the Developer Guide for more information, and see Row Format, Storage Format, and SerDe above for details about setting a table's SerDe and SERDEPROPERTIES in a CREATE TABLE statement.

Note that both property_name and property_value must be quoted.

Example:
ALTER TABLE table_name SET SERDEPROPERTIES ( 'field.delim'  ',' );

Alter Table Storage Properties

ALTER TABLE table_name CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name, ...)]
   INTO num_buckets BUCKETS;

These statements change the table's physical storage properties.

NOTE: These commands will only modify Hive's metadata, and will NOT reorganize or reformat existing data. Users should make sure the actual data layout conforms with the metadata definition.

Alter Table Skewed or Stored as Directories

Version information

As of Hive 0.10.0 (HIVE-3072 and HIVE-3649). See HIVE-3026 for additional JIRA tickets that implemented list bucketing in Hive 0.10.0 and 0.11.0.

A table's SKEWED and STORED AS DIRECTORIES options can be changed with ALTER TABLE statements. See Skewed Tables above for the corresponding CREATE TABLE syntax.

Alter Table Skewed
ALTER TABLE table_name SKEWED BY (col_name1, col_name2, ...)
   ON ([(col_name1_value, col_name2_value, ...) [, (col_name1_value, col_name2_value), ...]
   [STORED AS DIRECTORIES];

The STORED AS DIRECTORIES option determines whether a skewed table uses the list bucketing feature, which creates subdirectories for skewed values.

Alter Table Not Skewed
ALTER TABLE table_name NOT SKEWED;

The NOT SKEWED option makes the table non-skewed and turns off the list bucketing feature (since a list-bucketing table is always skewed). This affects partitions created after the ALTER statement, but has no effect on partitions created before the ALTER statement.

Alter Table Not Stored as Directories
ALTER TABLE table_name NOT STORED AS DIRECTORIES;

This turns off the list bucketing feature, although the table remains skewed.

Alter Table Set Skewed Location
ALTER TABLE table_name SET SKEWED LOCATION (col_name1= "location1"  [, col_name2= "location2" , ...] );

This changes the location map for list bucketing.

Alter Table Constraints

Version information

As of Hive release 2.1.0.

 Table constraints can be added or removed via ALTER TABLE statements.

ALTER TABLE table_name ADD CONSTRAINT constraint_name PRIMARY KEY (column, ...) DISABLE NOVALIDATE;
ALTER TABLE table_name ADD CONSTRAINT constraint_name FOREIGN KEY (column, ...) REFERENCES table_name(column, ...) DISABLE NOVALIDATE RELY;
ALTER TABLE table_name DROP CONSTRAINT constraint_name;

Additional Alter Table Statements

See Alter Either Table or Partition below for more DDL statements that alter tables.

Alter Partition

Partitions can be added, renamed, exchanged (moved), dropped, or (un)archived by using the PARTITION clause in an ALTER TABLE statement, as described below. To make the metastore aware of partitions that were added directly to HDFS, you can use the metastore check command (MSCK) or on Amazon EMR you can use the RECOVER PARTITIONS option of ALTER TABLE. See Alter Either Table or Partition below for more ways to alter partitions.

Version 1.2+

As of Hive 1.2 (HIVE-10307), the partition values specified in partition specification are type checked, converted, and normalized to conform to their column types if the property hive.typecheck.on.insert is set to true (default). The values can be number literals.

Add Partitions

ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec [LOCATION  'location' ][, PARTITION partition_spec [LOCATION  'location' ], ...];
 
partition_spec:
   : (partition_column = partition_col_value, partition_column = partition_col_value, ...)

You can use ALTER TABLE ADD PARTITION to add partitions to a table. Partition values should be quoted only if they are strings. The location must be a directory inside of which data files reside. (ADD PARTITION changes the table metadata, but does not load data. If the data does not exist in the partition's location, queries will not return any results.) An error is thrown if the partition_spec for the table already exists. You can use IF NOT EXISTS to skip the error.

Version 0.7

Although it is proper syntax to have multiple partition_spec in a single ALTER TABLE, if you do this in version 0.7 your partitioning scheme will fail. That is, every query specifying a partition will always use only the first partition.

Specifically, the following example will FAIL silently and without error in Hive 0.7, and all queries will go only to dt='2008-08-08' partition, no matter which partition you specify.

Example:
ALTER TABLE page_view ADD PARTITION (dt= '2008-08-08' , country= 'us' ) location  '/path/to/us/part080808'
                           PARTITION (dt= '2008-08-09' , country= 'us' ) location  '/path/to/us/part080809' ;

In Hive 0.8 and later, you can add multiple partitions in a single ALTER TABLE statement as shown in the previous example.

In Hive 0.7, if you want to add many partitions you should use the following form:

ALTER TABLE table_name ADD PARTITION (partCol =  'value1' ) location  'loc1' ;
ALTER TABLE table_name ADD PARTITION (partCol =  'value2' ) location  'loc2' ;
...
ALTER TABLE table_name ADD PARTITION (partCol =  'valueN' ) location  'locN' ;
Dynamic Partitions

Partitions can be added to a table dynamically, using a Hive INSERT statement (or a Pig STORE statement). See these documents for details and examples:

Rename Partition

Version information

As of Hive 0.9.

ALTER TABLE table_name PARTITION partition_spec RENAME TO PARTITION partition_spec;

This statement lets you change the value of a partition column. One of use cases is that you can use this statement to normalize your legacy partition column value to conform to its type. In this case, the type conversion and normalization are not enabled for the column values in old partition_spec even with property hive.typecheck.on.insert set to true (default) which allows you to specify any legacy data in form of string in the old partition_spec.

Exchange Partition

Partitions can be exchanged (moved) between tables.

Version information

As of Hive 0.12 (HIVE-4095). Multiple partitions supported in Hive versions 1.2.2, 1.3.0, and 2.0.0+.

-- Move partition from table_name_1 to table_name_2
ALTER TABLE table_name_2 EXCHANGE PARTITION (partition_spec) WITH TABLE table_name_1;
-- multiple partitions
ALTER TABLE table_name_2 EXCHANGE PARTITION (partition_spec, partition_spec2, ...) WITH TABLE table_name_1;

This statement lets you move the data in a partition from a table to another table that has the same schema and does not already have that partition. 
For further details on this feature, see Exchange Partition and HIVE-4095.

Recover Partitions (MSCK REPAIR TABLE)

Hive stores a list of partitions for each table in its metastore. If, however, new partitions are directly added to HDFS (say by using hadoop fs -put command), the metastore (and hence Hive) will not be aware of these partitions unless the user runs ALTER TABLE table_name ADD PARTITION commands on each of the newly added partitions.

However, users can run a metastore check command with the repair table option:

MSCK REPAIR TABLE table_name;

which will add metadata about partitions to the Hive metastore for partitions for which such metadata doesn't already exist. In other words, it will add any partitions that exist on HDFS but not in metastore to the metastore. See HIVE-874 for more details. When there is a large number of untracked partitions, there is a provision to run MSCK REPAIR TABLE batch wise to avoid OOME (Out of Memory Error). By giving the configured batch size for the property hive.msck.repair.batch.size it can run in the batches internally. The default value of the property is zero, it means it will execute all the partitions at once.

The equivalent command on Amazon Elastic MapReduce (EMR)'s version of Hive is:

ALTER TABLE table_name RECOVER PARTITIONS;

Starting with Hive 1.3, MSCK will throw exceptions if directories with disallowed characters in partition values are found on HDFS. Use hive.msck.path.validation setting on the client to alter this behavior; "skip" will simply skip the directories. "ignore" will try to create partitions anyway (old behavior). This may or may not work.

Drop Partitions

ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec[, PARTITION partition_spec, ...]
   [IGNORE PROTECTION] [PURGE];            -- (Note: PURGE available in Hive  1.2 . 0  and later, IGNORE PROTECTION not available  2.0 . 0  and later)

You can use ALTER TABLE DROP PARTITION to drop a partition for a table. This removes the data and metadata for this partition. The data is actually moved to the .Trash/Current directory if Trash is configured, unless PURGE is specified, but the metadata is completely lost (see Drop Table above).

Version Information: PROTECTION

IGNORE PROTECTION is no longer available in versions 2.0.0 and later. This functionality is replaced by using one of the several security options available with Hive (see SQL Standard Based Hive Authorization). See HIVE-11145 for details.

For tables that are protected by NO_DROP CASCADE, you can use the predicate IGNORE PROTECTION to drop a specified partition or set of partitions (for example, when splitting a table between two Hadoop clusters):

ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec IGNORE PROTECTION;

The above command will drop that partition regardless of protection stats.

Version information: PURGE

The PURGE option is added to ALTER TABLE in version 1.2.1 by HIVE-10934.

If PURGE is specified, the partition data does not go to the .Trash/Current directory and so cannot be retrieved in the event of a mistaken DROP:

ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec PURGE;     -- (Note: Hive 1.2.0 and later)

The purge option can also be specified with the table property auto.purge (see TBLPROPERTIES above).

In Hive 0.7.0 or later, DROP returns an error if the partition doesn't exist, unless IF EXISTS is specified or the configuration variable hive.exec.drop.ignorenonexistent is set to true.

ALTER TABLE page_view DROP PARTITION (dt= '2008-08-08' , country= 'us' );

(Un)Archive Partition

ALTER TABLE table_name ARCHIVE PARTITION partition_spec;
ALTER TABLE table_name UNARCHIVE PARTITION partition_spec;

Archiving is a feature to moves a partition's files into a Hadoop Archive (HAR). Note that only the file count will be reduced; HAR does not provide any compression. See LanguageManual Archiving for more information

Alter Either Table or Partition

Alter Table/Partition File Format

ALTER TABLE table_name [PARTITION partition_spec] SET FILEFORMAT file_format;

This statement changes the table's (or partition's) file format. For available file_format options, see the section above on CREATE TABLE. The operation only changes the table metadata. Any conversion of existing data must be done outside of Hive.

Alter Table/Partition Location

ALTER TABLE table_name [PARTITION partition_spec] SET LOCATION  "new location" ;

Alter Table/Partition Touch

ALTER TABLE table_name TOUCH [PARTITION partition_spec];

TOUCH reads the metadata, and writes it back. This has the effect of causing the pre/post execute hooks to fire. An example use case is if you have a hook that logs all the tables/partitions that were modified, along with an external script that alters the files on HDFS directly. Since the script modifies files outside of hive, the modification wouldn't be logged by the hook. The external script could call TOUCH to fire the hook and mark the said table or partition as modified.

Also, it may be useful later if we incorporate reliable last modified times. Then touch would update that time as well.

Note that TOUCH doesn't create a table or partition if it doesn't already exist. (See Create Table.)

Alter Table/Partition Protections

Version information

As of Hive 0.7.0 (HIVE-1413). The CASCADE clause for NO_DROP was added in HIVE 0.8.0 (HIVE-2605).

This functionality was removed in Hive 2.0.0. This functionality is replaced by using one of the several security options available with Hive (see SQL Standard Based Hive Authorization). See HIVE-11145 for details.

ALTER TABLE table_name [PARTITION partition_spec] ENABLE|DISABLE NO_DROP [CASCADE];
 
ALTER TABLE table_name [PARTITION partition_spec] ENABLE|DISABLE OFFLINE;

Protection on data can be set at either the table or partition level. Enabling NO_DROP prevents a table from being dropped. Enabling OFFLINE prevents the data in a table or partition from being queried, but the metadata can still be accessed.

If any partition in a table has NO_DROP enabled, the table cannot be dropped either. Conversely, if a table has NO_DROP enabled then partitions may be dropped, but with NO_DROP CASCADE partitions cannot be dropped either unless the drop partition command specifies IGNORE PROTECTION.

Alter Table/Partition Compact

Version information

In Hive release 0.13.0 and later when transactions are being used, the ALTER TABLE statement can request compaction of a table or partition. As of Hive release 1.3.0 and 2.1.0 when transactions are being used, the ALTER TABLE ... COMPACT statement can include a TBLPROPERTIES clause that is either to change compaction MapReduce job properties or to overwrite any other Hive table properties. More details can be found here.

ALTER TABLE table_name [PARTITION (partition_key = 'partition_value' [, ...])]
   COMPACT 'compaction_type'[AND WAIT]
   [WITH OVERWRITE TBLPROPERTIES ("property"="value" [, ...])];

In general you do not need to request compactions when Hive transactions are being used, because the system will detect the need for them and initiate the compaction. However, if compaction is turned off for a table or you want to compact the table at a time the system would not choose to, ALTER TABLE can initiate the compaction. By default the statement will enqueue a request for compaction and return. To watch the progress of the compaction, use SHOW COMPACTIONS. As of Hive 2.2.0 "AND WAIT" may be specified to have the operation block until compaction completes.

The compaction_type can be MAJOR or MINOR. See the Basic Design section in Hive Transactions for more information.

Alter Table/Partition Concatenate

Version information

In Hive release 0.8.0 RCFile added support for fast block level merging of small RCFiles using concatenate command. In Hive release 0.14.0 ORC files added support fast stripe level merging of small ORC files using concatenate command.

ALTER TABLE table_name [PARTITION (partition_key = 'partition_value' [, ...])] CONCATENATE;

If the table or partition contains many small RCFiles or ORC files, then the above command will merge them into larger files. In case of RCFile the merge happens at block level whereas for ORC files the merge happens at stripe level thereby avoiding the overhead of decompressing and decoding the data.

Alter Column

Rules for Column Names

Column names are case insensitive.

Version information

In Hive release 0.12.0 and earlier, column names can only contain alphanumeric and underscore characters.

In Hive release 0.13.0 and later, by default column names can be specified within backticks (`) and contain any Unicode character (HIVE-6013), however, dot (.) and colon (:) yield errors on querying. Within a string delimited by backticks, all characters are treated literally except that double backticks (``) represent one backtick character. The pre-0.13.0 behavior can be used by setting hive.support.quoted.identifiers to none, in which case backticked names are interpreted as regular expressions. See Supporting Quoted Identifiers in Column Names for details.

Backtick quotation enables the use of reserved keywords for column names, as well as table names.

Change Column Name/Type/Position/Comment

ALTER TABLE table_name [PARTITION partition_spec] CHANGE [COLUMN] col_old_name col_new_name column_type
   [COMMENT col_comment] [FIRST|AFTER column_name] [CASCADE|RESTRICT];

This command will allow users to change a column's name, data type, comment, or position, or an arbitrary combination of them. The PARTITION clause is available in Hive 0.14.0 and later; see Upgrading Pre-Hive 0.13.0 Decimal Columns for usage. A patch for Hive 0.13 is also available (see HIVE-7971).

The CASCADE|RESTRICT clause is available in Hive 1.1.0. ALTER TABLE CHANGE COLUMN with CASCADE command changes the columns of a table's metadata, and cascades the same change to all the partition metadata. RESTRICT is the default, limiting column change only to table metadata.

ALTER TABLE CHANGE COLUMN CASCADE clause will override the table partition's column metadata regardless of the table or partition's protection mode. Use with discretion.

The column change command will only modify Hive's metadata, and will not modify data. Users should make sure the actual data layout of the table/partition conforms with the metadata definition.

Example:
CREATE TABLE test_change (a  int , b  int , c  int );
 
// First change column a's name to a1.
ALTER TABLE test_change CHANGE a a1 INT;
 
// Next change column a1's name to a2, its data type to string, and put it after column b.
ALTER TABLE test_change CHANGE a1 a2 STRING AFTER b;
// The new table's structure is:  b int, a2 string, c int.
 
// Then change column c's name to c1, and put it as the first column.
ALTER TABLE test_change CHANGE c c1 INT FIRST;
// The new table's structure is:  c1 int, b int, a2 string.
 
// Add a comment to column a1
ALTER TABLE test_change CHANGE a1 a1 INT COMMENT  'this is column a1' ;

 

Add/Replace Columns

ALTER TABLE table_name 
   [PARTITION partition_spec]                 -- (Note: Hive  0.14 . 0  and later)
   ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...)
   [CASCADE|RESTRICT]                         -- (Note: Hive  1.1 . 0  and later)

ADD COLUMNS lets you add new columns to the end of the existing columns but before the partition columns. This is supported for Avro backed tables as well, for Hive 0.14 and later.

REPLACE COLUMNS removes all existing columns and adds the new set of columns. This can be done only for tables with a native SerDe (DynamicSerDe, MetadataTypedColumnsetSerDe, LazySimpleSerDe and ColumnarSerDe). Refer to Hive SerDe for more information. REPLACE COLUMNS can also be used to drop columns. For example, "ALTER TABLE test_change REPLACE COLUMNS (a int, b int);" will remove column 'c' from test_change's schema.

The PARTITION clause is available in Hive 0.14.0 and later; see Upgrading Pre-Hive 0.13.0 Decimal Columns for usage.

The CASCADE|RESTRICT clause is available in Hive 1.1.0. ALTER TABLE ADD|REPLACE COLUMNS with CASCADE command changes the columns of a table's metadata, and cascades the same change to all the partition metadata. RESTRICT is the default, limiting column changes only to table metadata.

ALTER TABLE ADD or REPLACE COLUMNS CASCADE will override the table partition's column metadata regardless of the table or partition's  protection mode. Use with discretion.

The column change command will only modify Hive's metadata, and will not modify data. Users should make sure the actual data layout of the table/partition conforms with the metadata definition.

Partial Partition Specification

As of Hive 0.14 (HIVE-8411), users are able to provide a partial partition spec for certain above alter column statements, similar to dynamic partitioning. So rather than having to issue an alter column statement for each partition that needs to be changed:

ALTER TABLE foo PARTITION (ds= '2008-04-08' , hr= 11 ) CHANGE COLUMN dec_column_name dec_column_name DECIMAL( 38 , 18 );
ALTER TABLE foo PARTITION (ds= '2008-04-08' , hr= 12 ) CHANGE COLUMN dec_column_name dec_column_name DECIMAL( 38 , 18 );
...

... you can change many existing partitions at once using a single ALTER statement with a partial partition specification:

// hive.exec.dynamic.partition needs to be set to true to enable dynamic partitioning with ALTER PARTITION
SET hive.exec.dynamic.partition =  true ;
  
// This will alter all existing partitions in the table with ds='2008-04-08' -- be sure you know what you are doing!
ALTER TABLE foo PARTITION (ds= '2008-04-08' , hr) CHANGE COLUMN dec_column_name dec_column_name DECIMAL( 38 , 18 );
 
// This will alter all existing partitions in the table -- be sure you know what you are doing!
ALTER TABLE foo PARTITION (ds, hr) CHANGE COLUMN dec_column_name dec_column_name DECIMAL( 38 , 18 );

 

Similar to dynamic partitioning, hive.exec.dynamic.partition must be set to true to enable use of partial partition specs during ALTER PARTITION. This is supported for the following operations:

  • Change column
  • Add column
  • Replace column
  • File Format
  • Serde Properties

Create/Drop/Alter View

Version information

View support is only available in Hive 0.6 and later.

Create View

CREATE VIEW [IF NOT EXISTS] [db_name.]view_name [(column_name [COMMENT column_comment], ...) ]
   [COMMENT view_comment]
   [TBLPROPERTIES (property_name = property_value, ...)]
   AS SELECT ...;

CREATE VIEW creates a view with the given name. An error is thrown if a table or view with the same name already exists. You can use IF NOT EXISTS to skip the error.

If no column names are supplied, the names of the view's columns will be derived automatically from the defining SELECT expression. (If the SELECT contains unaliased scalar expressions such as x+y, the resulting view column names will be generated in the form _C0, _C1, etc.) When renaming columns, column comments can also optionally be supplied. (Comments are not automatically inherited from underlying columns.)

A CREATE VIEW statement will fail if the view's defining SELECT expression is invalid.

Note that a view is a purely logical object with no associated storage. (No support for materialized views is currently available in Hive.) When a query references a view, the view's definition is evaluated in order to produce a set of rows for further processing by the query. (This is a conceptual description; in fact, as part of query optimization, Hive may combine the view's definition with the query's, e.g. pushing filters from the query down into the view.)

A view's schema is frozen at the time the view is created; subsequent changes to underlying tables (e.g. adding a column) will not be reflected in the view's schema. If an underlying table is dropped or changed in an incompatible fashion, subsequent attempts to query the invalid view will fail.

Views are read-only and may not be used as the target of LOAD/INSERT/ALTER. For changing metadata, see ALTER VIEW.

A view may contain ORDER BY and LIMIT clauses. If a referencing query also contains these clauses, the query-level clauses are evaluated after the view clauses (and after any other operations in the query). For example, if a view specifies LIMIT 5, and a referencing query is executed as (select * from v LIMIT 10), then at most 5 rows will be returned.

Starting with Hive 0.13.0, the view's select statement can include one or more common table expressions (CTEs) as shown in the SELECT syntax. For examples of CTEs in CREATE VIEW statements, see Common Table Expression.

Example:
CREATE VIEW onion_referrers(url COMMENT  'URL of Referring page' )
   COMMENT  'Referrers to The Onion website'
   AS
   SELECT DISTINCT referrer_url
   FROM page_view
   WHERE page_url= 'http://www.theonion.com' ;

Use SHOW CREATE TABLE to display the CREATE VIEW statement that created a view. As of Hive 2.2.0, SHOW VIEWS displays a list of views in a database.

Version Information

Originally, the file format for views was hard coded as SequenceFile. Hive 2.1.0 (HIVE-13736) made views follow the same defaults as tables and indexes using the hive.default.fileformat and hive.default.fileformat.managed properties.

Drop View

DROP VIEW [IF EXISTS] [db_name.]view_name;

DROP VIEW removes metadata for the specified view. (It is illegal to use DROP TABLE on a view.)

When dropping a view referenced by other views, no warning is given (the dependent views are left dangling as invalid and must be dropped or recreated by the user).

In Hive 0.7.0 or later, DROP returns an error if the view doesn't exist, unless IF EXISTS is specified or the configuration variable hive.exec.drop.ignorenonexistent is set to true.

Example:
DROP VIEW onion_referrers;

Alter View Properties

ALTER VIEW [db_name.]view_name SET TBLPROPERTIES table_properties;
 
table_properties:
   : (property_name = property_value, property_name = property_value, ...)

As with ALTER TABLE, you can use this statement to add your own metadata to a view.

Alter View As Select

Version information

As of Hive 0.11.

ALTER VIEW [db_name.]view_name AS select_statement;

Alter View As Select changes the definition of a view, which must exist. The syntax is similar to that for CREATE VIEW and the effect is the same as for CREATE OR REPLACE VIEW.

Note: The view must already exist, and if the view has partitions, it could not be replaced by Alter View As Select.

Create/Drop/Alter Index

Version information

As of Hive 0.7.

This section provides a brief introduction to Hive indexes, which are documented more fully here:

In Hive 0.12.0 and earlier releases, the index name is case-sensitive for CREATE INDEX and DROP INDEX statements. However, ALTER INDEX requires an index name that was created with lowercase letters (see HIVE-2752). This bug is fixed in Hive 0.13.0 by making index names case-insensitive for all HiveQL statements. For releases prior to 0.13.0, the best practice is to use lowercase letters for all index names.

Create Index

CREATE INDEX index_name
   ON TABLE base_table_name (col_name, ...)
   AS index_type
   [WITH DEFERRED REBUILD]
   [IDXPROPERTIES (property_name=property_value, ...)]
   [IN TABLE index_table_name]
   [
      [ ROW FORMAT ...] STORED AS ...
      | STORED BY ...
   ]
   [LOCATION hdfs_path]
   [TBLPROPERTIES (...)]
   [COMMENT  "index comment" ];

CREATE INDEX creates an index on a table using the given list of columns as keys. See CREATE INDEX in the Indexes design document.

Drop Index

DROP INDEX [IF EXISTS] index_name ON table_name;

DROP INDEX drops the index, as well as deleting the index table.

In Hive 0.7.0 or later, DROP returns an error if the index doesn't exist, unless IF EXISTS is specified or the configuration variable hive.exec.drop.ignorenonexistent is set to true.

Alter Index

ALTER INDEX index_name ON table_name [PARTITION partition_spec] REBUILD;

ALTER INDEX ... REBUILD builds an index that was created using the WITH DEFERRED REBUILD clause, or rebuilds a previously built index. If PARTITION is specified, only that partition is rebuilt.

Create/Drop Macro

Version information

As of Hive 0.12.0.

Bug fixes:

  • Prior to Hive 1.3.0 and 2.0.0 when a HiveQL macro was used more than once while processing the same row, Hive returned the same result for all invocations even though the arguments were different. (See HIVE-11432.)
  • Prior to Hive 1.3.0 and 2.0.0 when multiple macros were used while processing the same row, an ORDER BY clause could give wrong results. (See HIVE-12277.)
  • Prior to Hive 2.1.0 when multiple macros were used while processing the same row, results of the later macros were overwritten by that of the first. (See HIVE-13372.)

Hive 0.12.0 introduced macros to HiveQL, prior to which they could only be created in Java.

Create Temporary Macro

CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) expression;

CREATE TEMPORARY MACRO creates a macro using the given optional list of columns as inputs to the expression. Macros exist for the duration of the current session.

Examples:
CREATE TEMPORARY MACRO fixed_number()  42 ;
CREATE TEMPORARY MACRO string_len_plus_two(x string) length(x) +  2 ;
CREATE TEMPORARY MACRO simple_add (x  int , y  int ) x + y;

Drop Temporary Macro

DROP TEMPORARY MACRO [IF EXISTS] macro_name;

DROP TEMPORARY MACRO returns an error if the function doesn't exist, unless IF EXISTS is specified.

Create/Drop/Reload Function

Temporary Functions

Create Temporary Function

CREATE TEMPORARY FUNCTION function_name AS class_name;

This statement lets you create a function that is implemented by the class_name. You can use this function in Hive queries as long as the session lasts. You can use any class that is in the class path of Hive. You can add jars to class path by executing 'ADD JAR' statements. Please refer to the CLI section Hive Interactive Shell Commands, including Hive Resources, for more information on how to add/delete files from the Hive classpath. Using this, you can register User Defined Functions (UDF's).

Also see Hive Plugins for general information about creating custom UDFs.

Drop Temporary Function

You can unregister a UDF as follows:

DROP TEMPORARY FUNCTION [IF EXISTS] function_name;

In Hive 0.7.0 or later, DROP returns an error if the function doesn't exist, unless IF EXISTS is specified or the configuration variable hive.exec.drop.ignorenonexistent is set to true.

Permanent Functions

In Hive 0.13 or later, functions can be registered to the metastore, so they can be referenced in a query without having to create a temporary function each session.

Create Function

Version information

As of Hive 0.13.0 (HIVE-6047).

CREATE FUNCTION [db_name.]function_name AS class_name
   [USING JAR|FILE|ARCHIVE  'file_uri'  [, JAR|FILE|ARCHIVE  'file_uri' ] ];

This statement lets you create a function that is implemented by the class_name. Jars, files, or archives which need to be added to the environment can be specified with the USING clause; when the function is referenced for the first time by a Hive session, these resources will be added to the environment as if ADD JAR/FILE had been issued. If Hive is not in local mode, then the resource location must be a non-local URI such as an HDFS location.

The function will be added to the database specified, or to the current database at the time that the function was created. The function can be referenced by fully qualifying the function name (db_name.function_name), or can be referenced without qualification if the function is in the current database.

Drop Function

Version information

As of Hive 0.13.0 (HIVE-6047).

DROP FUNCTION [IF EXISTS] function_name;

DROP returns an error if the function doesn't exist, unless IF EXISTS is specified or the configuration variable hive.exec.drop.ignorenonexistent is set to true.

Reload Function

Version information

As of Hive 1.2.0 (HIVE-2573).

RELOAD FUNCTION;

As of HIVE-2573, creating permanent functions in one Hive CLI session may not be reflected in HiveServer2 or other Hive CLI sessions, if they were started before the function was created. Issuing RELOAD FUNCTION within a HiveServer2 or HiveCLI session will allow it to pick up any changes to the permanent functions that may have been done by a different HiveCLI session.

Create/Drop/Grant/Revoke Roles and Privileges

Hive deprecated authorization mode / Legacy Mode has information about these DDL statements:

For SQL standard based authorization in Hive 0.13.0 and later releases, see these DDL statements:

Show

These statements provide a way to query the Hive metastore for existing data and metadata accessible to this Hive system.

Show Databases

SHOW (DATABASES|SCHEMAS) [LIKE  'identifier_with_wildcards' ];

SHOW DATABASES or SHOW SCHEMAS lists all of the databases defined in the metastore. The uses of SCHEMAS and DATABASES are interchangeable – they mean the same thing.

The optional LIKE clause allows the list of databases to be filtered using a regular expression. Wildcards in the regular expression can only be '*' for any character(s) or '|' for a choice. Examples are 'employees', 'emp*', 'emp*|*ees', all of which will match the database named 'employees'.

Show Tables/Views/Partitions/Indexes

Show Tables

SHOW TABLES [IN database_name] [ 'identifier_with_wildcards' ];

SHOW TABLES lists all the base tables and views in the current database (or the one explicitly named using the IN clause) with names matching the optional regular expression. Wildcards in the regular expression can only be '*' for any character(s) or '|' for a choice. Examples are 'page_view', 'page_v*', '*view|page*', all which will match the 'page_view' table. Matching tables are listed in alphabetical order. It is not an error if there are no matching tables found in metastore. If no regular expression is given then all tables in the selected database are listed.

Show Views

Version information

Introduced in Hive 2.2.0 via HIVE-14558.

SHOW VIEWS [IN/FROM database_name] [LIKE  'pattern_with_wildcards' ];

SHOW VIEWS lists all the views in the current database (or the one explicitly named using the IN or FROM clause) with names matching the optional regular expression. Wildcards in the regular expression can only be '*' for any character(s) or '|' for a choice. Examples are 'page_view', 'page_v*', '*view|page*', all which will match the 'page_view' view. Matching views are listed in alphabetical order. It is not an error if no matching views are found in metastore. If no regular expression is given then all views in the selected database are listed.

Examples
SHOW VIEWS;                                 -- show all views in the current database
SHOW VIEWS  'test_*' ;                        -- show all views that start with "test_"
SHOW VIEWS  '*view2' ;                        -- show all views that end in "view2"
SHOW VIEWS  LIKE  'test_view1|test_view2' ;    -- show views named either "test_view1" or "test_view2"
SHOW VIEWS  FROM  test1;                      -- show views from database test1
SHOW VIEWS  IN  test1;                        -- show views from database test1 (FROM and IN are same)
SHOW VIEWS  IN  test1  "test_*" ;               -- show views from database test2 that start with "test_"

Show Partitions

SHOW PARTITIONS table_name;

SHOW PARTITIONS lists all the existing partitions for a given base table. Partitions are listed in alphabetical order.

Version information

As of Hive 0.6, SHOW PARTITIONS can filter the list of partitions as shown below.

It is also possible to specify parts of a partition specification to filter the resulting list.

Examples:
SHOW PARTITIONS table_name PARTITION(ds= '2010-03-03' );            -- (Note: Hive  0.6  and later)
SHOW PARTITIONS table_name PARTITION(hr= '12' );                    -- (Note: Hive  0.6  and later)
SHOW PARTITIONS table_name PARTITION(ds= '2010-03-03' , hr= '12' );   -- (Note: Hive  0.6  and later)

Version information

Starting with Hive 0.13.0, SHOW PARTITIONS can specify a database (HIVE-5912).

SHOW PARTITIONS [db_name.]table_name [PARTITION(partition_spec)];   -- (Note: Hive  0.13 . 0  and later)
Example:
SHOW PARTITIONS databaseFoo.tableBar PARTITION(ds= '2010-03-03' , hr= '12' );   -- (Note: Hive  0.13 . 0  and later)

Show Table/Partition Extended

SHOW TABLE EXTENDED [IN|FROM database_name] LIKE  'identifier_with_wildcards'  [PARTITION(partition_spec)];

SHOW TABLE EXTENDED will list information for all tables matching the given regular expression. Users cannot use regular expression for table name if a partition specification is present. This command's output includes basic table information and file system information like totalNumberFiles, totalFileSize, maxFileSize, minFileSize,lastAccessTime, and lastUpdateTime. If partition is present, it will output the given partition's file system information instead of table's file system information.

 

Example
hive> show  table  extended  like  part_table;
OK
tableName:part_table
owner:thejas
location:file:/tmp/warehouse/part_table
inputformat:org.apache.hadoop.mapred.TextInputFormat
outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
columns:struct columns { i32 i}
partitioned: true
partitionColumns:struct partition_columns { string d}
totalNumberFiles:1
totalFileSize:2
maxFileSize:2
minFileSize:2
lastAccessTime:0
lastUpdateTime:1459382233000

Show Table Properties

Version information

As of Hive 0.10.0.

SHOW TBLPROPERTIES tblname;
SHOW TBLPROPERTIES tblname( "foo" );

The first form lists all of the table properties for the table in question, one per row separated by tabs. The second form of the command prints only the value for the property that's being asked for.

For more information, see the TBLPROPERTIES clause in Create Table above.

Show Create Table

Version information

As of Hive 0.10.

SHOW CREATE TABLE ([db_name.]table_name|view_name);

SHOW CREATE TABLE shows the CREATE TABLE statement that creates a given table, or the CREATE VIEW statement that creates a given view.

Show Indexes

Version information

As of Hive 0.7.

SHOW [FORMATTED] (INDEX|INDEXES) ON table_with_index [(FROM|IN) db_name];

SHOW INDEXES shows all of the indexes on a certain column, as well as information about them: index name, table name, names of the columns used as keys, index table name, index type, and comment. If the FORMATTED keyword is used, then column titles are printed for each column.

Show Columns

Version information

As of Hive 0.10.

SHOW COLUMNS (FROM|IN) table_name [(FROM|IN) db_name];

SHOW COLUMNS shows all the columns in a table including partition columns.

 

Version information

SHOW COLUMNS (FROM|IN) table_name [(FROM|IN) db_name]  [ LIKE 'pattern_with_wildcards'];

Added in Hive 3.0 by HIVE-18373.

SHOW COLUMNS lists all the columns in the table with names matching the optional regular expression. Wildcards in the regular expression can only be '*' for any character(s) or '|' for a choice. Examples are 'cola', 'col*', '*a|col*', all which will match the 'cola' column. Matching columns are listed in alphabetical order. It is not an error if no matching columns are found in table. If no regular expression is given then all columns in the selected table are listed.


Examples
-- SHOW COLUMNS
CREATE  DATABASE  test_db;
USE test_db;
CREATE  TABLE  foo(col1  INT , col2  INT , col3  INT , cola  INT , colb  INT , colc  INT , a  INT , b  INT , c  INT );
  
-- SHOW COLUMNS basic syntax
SHOW COLUMNS  FROM  foo;                             -- show all column in foo
SHOW COLUMNS  FROM  foo  "*" ;                         -- show all column in foo
SHOW COLUMNS  IN  foo  "col*" ;                        -- show columns in foo starting with "col"                 OUTPUT col1,col2,col3,cola,colb,colc
SHOW COLUMNS  FROM  foo  '*c' ;                        -- show columns in foo ending with "c"                     OUTPUT c,colc
SHOW COLUMNS  FROM  foo  LIKE  "col1|cola" ;            -- show columns in foo either col1 or cola                 OUTPUT col1,cola
SHOW COLUMNS  FROM  foo  FROM  test_db  LIKE  'col*' ;    -- show columns in foo starting with "col"                 OUTPUT col1,col2,col3,cola,colb,colc
SHOW COLUMNS  IN  foo  IN  test_db  LIKE  'col*' ;        -- show columns in foo starting with "col" (FROM/IN same)  OUTPUT col1,col2,col3,cola,colb,colc
 
-- Non existing column pattern resulting in no match
SHOW COLUMNS  IN  foo  "nomatch*" ;
SHOW COLUMNS  IN  foo  "col+" ;                        -- + wildcard not supported
SHOW COLUMNS  IN  foo  "nomatch" ;


Show Functions

SHOW FUNCTIONS  "a.*" ;

SHOW FUNCTIONS lists all the user defined and builtin functions matching the regular expression. To get all functions use ".*"

Show Granted Roles and Privileges

Hive deprecated authorization mode / Legacy Mode has information about these SHOW statements:

In Hive 0.13.0 and later releases, SQL standard based authorization has these SHOW statements:

Show Locks

SHOW LOCKS <table_name>;
SHOW LOCKS <table_name> EXTENDED;
SHOW LOCKS <table_name> PARTITION (<partition_spec>);
SHOW LOCKS <table_name> PARTITION (<partition_spec>) EXTENDED;
SHOW LOCKS (DATABASE|SCHEMA) database_name;     -- (Note: Hive 0.13.0 and later; SCHEMA added in Hive 0.14.0)

SHOW LOCKS displays the locks on a table or partition. See Hive Concurrency Model for information about locks.

SHOW LOCKS (DATABASE|SCHEMA) is supported from Hive 0.13 for DATABASE (see HIVE-2093) and Hive 0.14 for SCHEMA (see HIVE-6601). SCHEMA and DATABASE are interchangeable – they mean the same thing.

When Hive transactions are being used, SHOW LOCKS returns this information (see HIVE-6460):

  • database name
  • table name
  • partition name (if the table is partitioned)
  • the state the lock is in, which can be:
    • "acquired" – the requestor holds the lock
    • "waiting" – the requestor is waiting for the lock
    • "aborted" – the lock has timed out but has not yet been cleaned up
  • Id of the lock blocking this one, if this lock is in "waiting" state
  • the type of lock, which can be:
    • "exclusive" – no one else can hold the lock at the same time (obtained mostly by DDL operations such as drop table)
    • "shared_read" – any number of other shared_read locks can lock the same resource at the same time (obtained by reads; confusingly, an insert operation also obtains a shared_read lock)
    • "shared_write" – any number of shared_read locks can lock the same resource at the same time, but no other shared_write locks are allowed (obtained by update and delete)
  • ID of the transaction this lock is associated with, if there is one
  • last time the holder of this lock sent a heartbeat indicating it was still alive
  • the time the lock was acquired, if it has been acquired
  • Hive user who requested the lock
  • host the user is running on
  • agent info – a string that helps identify the entity that issued the lock request. For a SQL client this is the query ID, for streaming client it may be Storm bolt ID for example.

Show Conf

Version information

As of Hive 0.14.0.

SHOW CONF <configuration_name>;

SHOW CONF returns a description of the specified configuration property.

  • default value
  • required type
  • description

Note that SHOW CONF does not show the current value of a configuration property. For current property settings, use the "set" command in the CLI or a HiveQL script (see Commands) or in Beeline (see Beeline Hive Commands).

Show Transactions

Version information

As of Hive 0.13.0 (see Hive Transactions).

SHOW TRANSACTIONS;

SHOW TRANSACTIONS is for use by administrators when Hive transactions are being used. It returns a list of all currently open and aborted transactions in the system, including this information:

  • transaction ID
  • transaction state
  • user who started the transaction
  • machine where the transaction was started
  • timestamp when the transaction was started (as of Hive 2.2.0)
  • timestamp for last heartbeat (as of Hive 2.2.0 )

Show Compactions

Version information

As of Hive 0.13.0 (see Hive Transactions).

SHOW COMPACTIONS;

SHOW COMPACTIONS returns a list of all tables and partitions currently being compacted or scheduled for compaction when Hive transactions are being used, including this information:

  • "CompactionId" - unique internal id (As of Hive 3.0)
  • "Database" - Hive database name
  • "Table" - table name
  • "Partition" - partition name (if the table is partitioned)
  • "Type" - whether it is a major or minor compaction
  • "State" - the state the compaction is in, which can be:
    • "initiated" – waiting in the queue to be compacted
    • "working" – being compacted
    • "ready for cleaning" – the compaction has been done and the old files are scheduled to be cleaned
    • "failed" – the job failed. The metastore log will have more detail.
    • "succeeded" – A-ok
    • "attempted" – initiator attempted to schedule a compaction but failed. The metastore log will have more information.
  • "Worker" - thread ID of the worker thread doing the compaction (only if in working state)
  • "Start Time" - the time at which the compaction started (only if in working or ready for cleaning state)
  • "Duration(ms)" - time this compaction took (As of Hive 2.2 )
  • "HadoopJobId" - Id of the submitted Hadoop job (As of Hive 2.2)

Compactions are initiated automatically, but can also be initiated manually with an ALTER TABLE COMPACT statement.

Describe

Describe Database

Version information

As of Hive 0.7.

DESCRIBE DATABASE [EXTENDED] db_name;
DESCRIBE SCHEMA [EXTENDED] db_name;     -- (Note: Hive  1.1 . 0  and later)

DESCRIBE DATABASE shows the name of the database, its comment (if one has been set), and its root location on the filesystem. The uses of SCHEMA and DATABASE are interchangeable – they mean the same thing. DESCRIBE SCHEMA is added in Hive 1.1.0 (HIVE-8803).

EXTENDED also shows the database properties.

Describe Table/View/Column

There are two formats for the describe table/view/column syntax, depending on whether or not the database is specified.

If the database is not specified, the optional column information is provided after a dot:

DESCRIBE [EXTENDED|FORMATTED] 
   table_name[.col_name ( [.field_name] | [. '$elem$' ] | [. '$key$' ] | [. '$value$' ] )* ];
                                         -- (Note: Hive  1 .x.x and  0 .x.x only. See  "Hive 2.0+: New Syntax"  below)

If the database is specified, the optional column information is provided after a space:

DESCRIBE [EXTENDED|FORMATTED] 
   [db_name.]table_name[ col_name ( [.field_name] | [. '$elem$' ] | [. '$key$' ] | [. '$value$' ] )* ];
                                         -- (Note: Hive  1 .x.x and  0 .x.x only. See  "Hive 2.0+: New Syntax"  below)

DESCRIBE shows the list of columns including partition columns for the given table. If the EXTENDED keyword is specified then it will show all the metadata for the table in Thrift serialized form. This is generally only useful for debugging and not for general use. If the FORMATTED keyword is specified, then it will show the metadata in a tabular format.

Note: DESCRIBE EXTENDED shows the number of rows only if statistics were gathered when the data was loaded (see Newly Created Tables), and if the Hive CLI is used instead of a Thrift client or Beeline. HIVE-6285 will address this issue. Although ANALYZE TABLE gathers statistics after the data has been loaded (see Existing Tables), it does not currently provide information about the number of rows.

If a table has a complex column then you can examine the attributes of this column by specifying table_name.complex_col_name (and field_name for an element of a struct, '$elem$' for array element, '$key$' for map key, and '$value$' for map value). You can specify this recursively to explore the complex column type.

For a view, DESCRIBE EXTENDED or FORMATTED can be used to retrieve the view's definition. Two relevant attributes are provided: both the original view definition as specified by the user, and an expanded definition used internally by Hive.

Version information — partition & non-partition columns

In Hive 0.10.0 and earlier, no distinction is made between partition columns and non-partition columns while displaying columns for DESCRIBE TABLE. From Hive 0.12.0 onwards, they are displayed separately.

In Hive 0.13.0 and later, the configuration parameter hive.display.partition.cols.separately lets you use the old behavior, if desired (HIVE-6689). For an example, see the test case in the patch for HIVE-6689.

Bug fixed in Hive 0.10.0 — database qualifiers

Database qualifiers for table names were introduced in Hive 0.7.0, but they were broken for DESCRIBE until a bug fix in Hive 0.10.0 (HIVE-1977).

Bug fixed in Hive 0.13.0 — quoted identifiers

Prior to Hive 0.13.0 DESCRIBE did not accept backticks (`) surrounding table identifiers, so DESCRIBE could not be used for tables with names that matched reserved keywords (HIVE-2949 and HIVE-6187). As of 0.13.0, all identifiers specified within backticks are treated literally when the configuration parameter hive.support.quoted.identifiers has its default value of "column" (HIVE-6013). The only exception is that double backticks (``) represent a single backtick character.

Display Column Statistics

Version information

As of Hive 0.14.0; see HIVE-7050 and HIVE-7051. (The FOR COLUMNS option of ANALYZE TABLE is available as of Hive 0.10.0.)

ANALYZE TABLE table_name COMPUTE STATISTICS FOR COLUMNS will compute column statistics for all columns in the specified table (and for all partitions if the table is partitioned). To view the gathered column statistics, the following statements can be used:

DESCRIBE FORMATTED [db_name.]table_name column_name;                              -- (Note: Hive  0.14 . 0  and later)
DESCRIBE FORMATTED [db_name.]table_name column_name PARTITION (partition_spec);   -- (Note: Hive  0.14 . 0  to  1 .x.x)
                                                                                   -- (see  "Hive 2.0+: New Syntax"  below)

See Statistics in Hive: Existing Tables for more information about the ANALYZE TABLE command.

Describe Partition

There are two formats for the describe partition syntax, depending on whether or not the database is specified.

If the database is not specified, the optional column information is provided after a dot:

DESCRIBE [EXTENDED|FORMATTED] table_name[.column_name] PARTITION partition_spec;
                                         -- (Note: Hive  1 .x.x and  0 .x.x only. See  "Hive 2.0+: New Syntax"  below)

If the database is specified, the optional column information is provided after a space:

DESCRIBE [EXTENDED|FORMATTED] [db_name.]table_name [column_name] PARTITION partition_spec;
                                         -- (Note: Hive  1 .x.x and  0 .x.x only. See  "Hive 2.0+: New Syntax"  below)

This statement lists metadata for a given partition. The output is similar to that of DESCRIBE table_name. Presently, the column information associated with a particular partition is not used while preparing plans. As of Hive 1.2 (HIVE-10307), the partition column values specified in partition_spec are type validated, converted and normalized to their column types when hive.typecheck.on.insert is set to true (default). These values can be number literals.

Example:
hive> show partitions part_table;
OK
d=abc
 
 
hive> DESCRIBE extended part_table partition (d= 'abc' );
OK
i                        int                                        
d                       string                                     
                  
# Partition Information         
# col_name              data_type               comment            
                  
d                       string                                     
                  
Detailed Partition Information  Partition( values :[abc], dbName: default , tableName:part_table, createTime:1459382234, lastAccessTime:0, sd:StorageDescriptor(cols:[FieldSchema( name :i, type: int , comment: null ), FieldSchema( name :d, type:string, comment: null )], location:file:/tmp/warehouse/part_table/d=abc, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed: false , numBuckets:-1, serdeInfo:SerDeInfo( name : null , serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories: false ), parameters:{numFiles=1, COLUMN_STATS_ACCURATE= true , transient_lastDdlTime=1459382234, numRows=1, totalSize=2, rawDataSize=1})  
Time  taken: 0.325 seconds, Fetched: 9 row(s)
 
 
hive> DESCRIBE formatted part_table partition (d= 'abc' );
OK
# col_name              data_type               comment            
                  
i                        int                                        
                  
# Partition Information         
# col_name              data_type               comment            
                  
d                       string                                     
                  
# Detailed Partition Information                
Partition Value:        [abc]                   
Database :                default                 
Table :                  part_table              
CreateTime:             Wed Mar 30 16:57:14 PDT 2016    
LastAccessTime:         UNKNOWN                 
Protect Mode:           None                    
Location:               file:/tmp/warehouse/part_table/d=abc    
Partition Parameters:           
         COLUMN_STATS_ACCURATE    true               
         numFiles                1                  
         numRows                 1                  
         rawDataSize             1                  
         totalSize               2                  
         transient_lastDdlTime   1459382234         
                  
# Storage Information           
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe      
InputFormat:            org.apache.hadoop.mapred.TextInputFormat        
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat      
Compressed:              No                      
Num Buckets:            -1                      
Bucket Columns:         []                      
Sort Columns:           []                      
Storage  Desc  Params:            
         serialization.format    1                  
Time  taken: 0.334 seconds, Fetched: 35 row(s)

Hive 2.0+: Syntax Change

Hive 2.0+: New syntax

In Hive 2.0 release onward, the describe table command has a syntax change which is backward incompatible. See  HIVE-12184 for details.
DESCRIBE [EXTENDED | FORMATTED]
     [db_name.]table_name [PARTITION partition_spec] [col_name ( [.field_name] | [. '$elem$' ] | [. '$key$' ] | [. '$value$' ] )* ];

Warning: The new syntax could break current scripts.

  • It no longer accepts DOT separated table_name and column_name. They would have to be SPACE-separated. DB and TABLENAME are DOT-separated. column_name can still contain DOTs for complex datatypes.
  • Optional partition_spec has to appear after the table_name but prior to the optional column_name. In the previous syntax, column_name appears in between table_name and partition_spec.
Examples:
DESCRIBE FORMATTED  default .src_table PARTITION (part_col = 100) columnA;
DESCRIBE  default .src_thrift lintString.$elem$.myint;

Abort

Abort Transactions

Version information

ABORT TRANSACTIONS transactionID [ transactionID ...];

ABORT TRANSACTIONS cleans up the specified transaction IDs from the Hive metastore so that users do not need to interact with the metastore directly in order to remove dangling or failed transactions. ABORT TRANSACTIONS is added in Hive 1.3.0 and 2.1.0 (HIVE-12634).

Example:
ABORT TRANSACTIONS 0000007 0000008 0000010 0000015;

This command can be used together with SHOW TRANSACTIONS. The latter can help figure out the candidate transaction IDs to be cleaned up.

HCatalog and WebHCat DDL

For information about DDL in HCatalog and WebHCat, see:


猜你喜欢

转载自blog.csdn.net/maizi1045/article/details/79724397
今日推荐