Hive SQL与标准SQL存在一些差异,但也是大同小异,HQL的基本语法为:
[ ] 中内容是可选的,{ }中内容是必选的,| 表示内容二选一,全大写单词为关键字
建表语法:
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, ...) [STORED BY (col_name [ASC|DESC], ...) ] INTO num_buckets BUCKETS]
[
[ROW FORMAT row_format] [STORED AS file_format]
| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] (Note: only available starting with 0.6.0)
]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)] (Note: only available starting with 0.5.0)
[AS select_statement] (Note: only available starting with 0.5.0)
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
LIKE existing_table_name
[LOCATION hdfs_path]
data_type:
: primitive_type
| array_type
| map_type
| struct_type
primitive_type:
: TINYINT
| SMALLINT
| INT
| BIGINT
| BOOLEAN
| FLOAT
| DOUBLE
| STRING
array_type
: ARRAY < data_type >
map_type
: MAP < primitive_type, data_type >
struct_type
: STRUCT < col_name : data_type [COMMENT col_comment], ... >
row_format
: DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char] [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
| SERED serde_name [WITH SERDEPROPERTIES (property_name=property_value, ...)]
file_format
: SEQUENCEFILE
| TEXTFILE
| RCFILE (Note: only available starting with 0.6.0)
| INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
修改表语法:
(1) 添加分区
ALTER TABLE table_name ADD [IF NOT EXISTS] partition_spec [LOCATION 'location'] partition_spec [LOCATION 'location2'] ...
partition_spec
: PARTITION (partition_col = partition_val, partition_col2 = partition_val2, ...)
(2) 删除分区
ALTER TABLE table_name DROP partition_spec, partition_spec, ...
(3) 表重命名
ALTER TABLE table_name RENAME TO new_table_name
(4) 修改字段
ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name col_type [COMMENT col_comment] [FIRST|AFTER col_name]
(5) 新增或替换字段
ALTER TABLE table_name ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...)
ADD是代表新增一字段,字段位置在所有列后面(partition 列前);REPLACE 则是表示替换表中所有字段。
建视图语法
CREATE VIEW [IF NOT EXISTS] view_name [(col_name [COMMENT col_comment], .. )]
[COMMENT view_comment]
[TBLPROPERTIES (property_name = property_value, ...)]
AS SELECT ...
查看表语法
(1) 查看当前库[指定库]中所有表名
SHOW TABLES [IN database] ;
(2) 查看表名,部分匹配
SHOW TABLES 'page.*' ;
SHOW TABLES '.*view' ;
(3) 查看某张表的所有Partition,如果没有就报错:
SHOW PARTITIONS table_name ;
(4) 查看某张表的结构:
DESC [EXTENDED | FORMATTED] table_name ;
(5) 查看分区内容
SELECT * FROM table_name WHERE partition_col = partition_value ;
加载数据语法:
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE table_name [PARTITION (partition_col = partition_value, partition_col2 = parititon_value2 , ...)]
插入语法:
1、查询结果插入到Hive表中
(1) 标准语法:
INSERT OVERWRITE TABLE table_name [PARTITION (partition_col = partition_val, partition_col2 = partition_val2)] select_statement FROM from_statement ;
(2) 扩展(插入到多张表):
FROM from_statement
INSERT OVERWRITE TABLE table_name1 [PARTITION partition_col = partiiton_val, partition_col2 = partition_val2 ... ] select_statement
[INSERT OVERWRITE TABLE table_name2 [PARTITION partition_col ...] select_statement2] ...
(3) 扩展 (动态分区插入数据) :
INSERT OVERWRITE TABLE table_name PARTITION (partiition_col[=partition_val], partition_col2[=partition_val2] select_statement FROM from_statement
2、查询结果写入文件中
(1) 标准语法:
INSERT OVERWRITE [LOCAL] DIRECTORY directory select_statement ;
(2) 扩展(写入到多个文件);
FROM from_statement
INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1
[INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] ...
删除语法:
DROP TABLE [IF EXISTS] table_name ;
删除一个内部表会同时删除表的元数据和数据;删除一个外部表,只删除元数据而保留数据。
查询语法:
SELECT [ALL | DISTINCT] select_expr, select_expr2, ...
FROM table_name
[WHERE where_condition]
[GROUP BY col_list]
[
[CLUSTER BY col_list | DISTRIBUTE BY col_list] [SORT BY col_list]
]
[LIMIT number]
关联语法:
table_reference [INNER] JOIN table_reference [join_condition]
| table_reference {LEFT | RIGHT | FULL} [OUTER] JOIN table_reference join_condition
| table_reference LEFT SEMI JOIN table_reference join_condition
table_reference
: table_name
| table_subquery alias
| (table_references)
join_condition
: ON expr = expr [AND expr2=expr2]...