Hive的DDL和DML操作

hive官网链接点击打开链接

摘要

hive的基础数据类型

基本类型

tinyint       smallint      int       bigint      boolean    float    double     string

复杂类型

array type      map  type     struct type

Create/Drop/Alter/Use Database

创建数据库
CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name
   [COMMENT database_comment]
   [LOCATION hdfs_path]
   [WITH DBPROPERTIES (property_name=property_value, ...)];

例子

create database if not exists mybase

comment "this is mybase"

location "hdfs:/usre/";

location 默认,在hive-site.xml中,由参数hive.metastore.warehouse.dir指定。默认值为/user/hive/warehouse.


CREATE DATABASE <DB_NAME> WITH DBPROPERTIES ('key1' = 'value1', 'key2' = 'value2');

The

DESC DATABASE EXTENDED <DB_NAME>;

删除数据库

Drop Database

DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];

例子

drop database if exists mybase cascade;

默认情况下,Hive不允许删除一个里面有表存在的数据库,如果想删除数据库,要么先将数据库中的表全部删除,要么可以使用CASCADE关键字,使用该关键字后,Hive会自己将数据库下的表全部删除。RESTRICT关键字就是默认情况,即如果有表存在,则不允许删除数据库。

修改数据库

Alter Database

ALTER (DATABASE|SCHEMA) database_name SET DBPROPERTIES (property_name=property_value, ...);   -- (Note: SCHEMA added in Hive 0.14 . 0 )
 
ALTER (DATABASE|SCHEMA) database_name SET OWNER [USER|ROLE] user_or_role;   -- (Note: Hive 0.13 . 0 and later; SCHEMA added in Hive 0.14 . 0 )
  
ALTER (DATABASE|SCHEMA) database_name SET LOCATION hdfs_path; -- (Note: Hive 2.2 . 1 , 2.4 . 0 and later)

The uses of SCHEMA and DATABASE are interchangeable – they mean the same thing. ALTER SCHEMA was added in Hive 0.14 (HIVE-6601).

The ALTER DATABASE ... SET LOCATION statement does not move the contents of the database's current directory to the newly specified location. It does not change the locations associated with any tables/partitions under the specified database. It only changes the default parent-directory where new tables will be added for this database. This behaviour is analogous to how changing a table-directory does not move existing partitions to a different location.

No other metadata about a database can be changed.

说明:改变数据库……SET LOCATION语句不会将数据库当前目录的内容移动到新指定的位置。它不会更改与指定数据库下的任何表/分区相关联的位置。它只更改将为该数据库添加新表的默认父目录。这种行为与更改表目录不将现有分区移动到不同位置类似

使用数据库

Use Database

USE database_name;
USE DEFAULT;

查看数据库

show databases;

查看数据库具体描述

desc database databasename;

desc database extended databasename;   查看带有with dbproperties的数据库的详细信息

----------------------------------------------------------------------------------------------------------------------------------

Create/Drop/Truncate Table

创建表


CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name    -- (Note: TEMPORARY available in Hive 0.14 . 0 and later)
   [(col_name data_type [COMMENT col_comment], ... [constraint_specification])]
   [COMMENT table_comment]
   [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
   [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
   [SKEWED BY (col_name, col_name, ...)                  -- (Note: Available in Hive 0.10 . 0 and later)]
      ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
      [STORED AS DIRECTORIES]
   [
    [ROW FORMAT row_format] 
    [STORED AS file_format]
      | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]  -- (Note: Available in Hive 0.6 . 0 and later)
   ]
   [LOCATION hdfs_path]
   [TBLPROPERTIES (property_name=property_value, ...)]   -- (Note: Available in Hive 0.6 . 0 and later)
   [AS select_statement];   -- (Note: Available in Hive 0.5 . 0 and later; not supported for external tables)
 
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
   LIKE existing_table_or_view_name
   [LOCATION hdfs_path];
 
data_type
   : primitive_type
   | array_type
   | map_type
   | struct_type
   | union_type  -- (Note: Available in Hive 0.7 . 0 and later)
 
primitive_type
   : TINYINT
   | SMALLINT
   | INT
   | BIGINT
   | BOOLEAN
   | FLOAT
   | DOUBLE
   | DOUBLE PRECISION -- (Note: Available in Hive 2.2 . 0 and later)
   | STRING
   | BINARY      -- (Note: Available in Hive 0.8 . 0 and later)
   | TIMESTAMP   -- (Note: Available in Hive 0.8 . 0 and later)
   | DECIMAL     -- (Note: Available in Hive 0.11 . 0 and later)
   | DECIMAL(precision, scale)  -- (Note: Available in Hive 0.13 . 0 and later)
   | DATE        -- (Note: Available in Hive 0.12 . 0 and later)
   | VARCHAR     -- (Note: Available in Hive 0.12 . 0 and later)
   | CHAR        -- (Note: Available in Hive 0.13 . 0 and later)
 
array_type
   : ARRAY < data_type >
 
map_type
   : MAP < primitive_type, data_type >
 
struct_type
   : STRUCT < col_name : data_type [COMMENT col_comment], ...>
 
union_type
    : UNIONTYPE < data_type, data_type, ... >  -- (Note: Available in Hive 0.7 . 0 and later)
 
row_format
   : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char ]] [COLLECTION ITEMS TERMINATED BY char ]
         [MAP KEYS TERMINATED BY char ] [LINES TERMINATED BY char ]
         [NULL DEFINED AS char ]   -- (Note: Available in Hive 0.13 and later)
   | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]
 
file_format:
   : SEQUENCEFILE
   | TEXTFILE    -- (Default, depending on hive. default .fileformat configuration)
   | RCFILE      -- (Note: Available in Hive 0.6 . 0 and later)
   | ORC         -- (Note: Available in Hive 0.11 . 0 and later)
   | PARQUET     -- (Note: Available in Hive 0.13 . 0 and later)
   | AVRO        -- (Note: Available in Hive 0.14 . 0 and later)
   | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
 
constraint_specification:
   : [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE ]
     [, CONSTRAINT constraint_name FOREIGN KEY (col_name, ...) REFERENCES table_name(col_name, ...) DISABLE NOVALIDATE

例子

数据

1,xiaoming,book-tv-code,beijing:sanyuanqiao-shanghai:pudong

2,zhangxianyu,game-code,beijing:tiananmen-shanghai:hupu

3,xiaowang,daiwa-tv,shenyang:hepng-huoxing:xxx

hql

建表方法一

create table psn1 (
id int ,
name string,
likes array<string>,
address map<string,string>
)
row format delimited
fields terminated by ','

collection items terminated by '-'

map keys terminated by ':';


建表方法二

create table psn2 like psn1;

如果psn1有数据,这条语句也只是创建psn1的表结构到psn2,不复制数据

建表方法三

create table psn3
as
select id,name,address from psn2;

常在创建中间表时候用,数据也会到新表中去

---------------------------------------------------------------------------------------------------------------------------------

创建外部表

需要指定文件存储位置

create external table psn11 (
id int ,
name string,
likes array<string>,
address map<string,string>
)
row format delimited
fields terminated by ','

collection items terminated by '-'

map keys terminated by ':'

location '/user/xxx';

外部表 在drop操作的时候不会删除数据文件

内部表会

---------------------------------------------------------------------------------------------------------------------------------

插入数据:基本不用insert 因为一条数据生成一个mr的任务,麻烦

There are multiple ways to modify data in Hive:

load

在将数据加载到表中时,Hive不做任何转换。负载操作目前是纯复制/移动操作,将数据文件移动到与Hive表对应的位置。

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
local:hive自动从本地把数据文件拷到hdfs临时目录下,然后在移动到/user/hive/warehouse下


----------------------------------------------------------------------------------------------------------------------------------

分区

分区字段不能在表里,load data 是 也需要指定分区字段

添加分区只是修改你的元数据信息

create table comouter (
id int,
name string comment 'this is name',
intr array<string>,
detail map<string,string>
)
comment 'computer'
partitioned by (price float)
row format delimited
fields terminated by ','

collection items terminated by '-'

map keys terminated by ':';

测试

目标表已经分区,load的时候需要提供分区字段

 load data local inpath '/home/data' into table computer partition (price=1111);


Add Partitions

ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec [LOCATION 'location' ][, PARTITION partition_spec [LOCATION 'location' ], ...];
 
partition_spec:
   : (partition_column = partition_col_value, partition_column = partition_col_value, ...)
alter table psn1 add partition (price=1211);

Drop Partitions

ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec[, PARTITION partition_spec, ...]
   [IGNORE PROTECTION] [PURGE];            -- (Note: PURGE available in Hive 1.2 . 0 and later, IGNORE PROTECTION not available 2.0 . 0 and later)

   : (partition_column = partition_col_value, partition_column = partition_col_value, ...)
alter table psn1 drop partition (price=1211);

常用的插入操作

from psn1

insert into table psn7

    select count(name);












猜你喜欢

转载自blog.csdn.net/paulfrank_zhang/article/details/80713445