Hive学习(三)------创建hive内部表和外部表

1.hive创建表格语法

hive官网创建表语法路径

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name    -- (Note: TEMPORARY available in Hive 0.14.0 and later)
  [(col_name data_type [column_constraint_specification] [COMMENT col_comment], ... [constraint_specification])]
  [COMMENT table_comment]
  [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
  [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
  [SKEWED BY (col_name, col_name, ...)                  -- (Note: Available in Hive 0.10.0 and later)]
     ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
     [STORED AS DIRECTORIES]
  [
   [ROW FORMAT row_format] 
   [STORED AS file_format]
     | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]  -- (Note: Available in Hive 0.6.0 and later)
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]   -- (Note: Available in Hive 0.6.0 and later)
  [AS select_statement];   -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)
 
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
  LIKE existing_table_or_view_name
  [LOCATION hdfs_path];
 
data_type
  : primitive_type
  | array_type
  | map_type
  | struct_type
  | union_type  -- (Note: Available in Hive 0.7.0 and later)
 
primitive_type
  : TINYINT
  | SMALLINT
  | INT
  | BIGINT
  | BOOLEAN
  | FLOAT
  | DOUBLE
  | DOUBLE PRECISION -- (Note: Available in Hive 2.2.0 and later)
  | STRING
  | BINARY      -- (Note: Available in Hive 0.8.0 and later)
  | TIMESTAMP   -- (Note: Available in Hive 0.8.0 and later)
  | DECIMAL     -- (Note: Available in Hive 0.11.0 and later)
  | DECIMAL(precision, scale)  -- (Note: Available in Hive 0.13.0 and later)
  | DATE        -- (Note: Available in Hive 0.12.0 and later)
  | VARCHAR     -- (Note: Available in Hive 0.12.0 and later)
  | CHAR        -- (Note: Available in Hive 0.13.0 and later)
 
array_type
  : ARRAY < data_type >
 
map_type
  : MAP < primitive_type, data_type >
 
struct_type
  : STRUCT < col_name : data_type [COMMENT col_comment], ...>
 
union_type
   : UNIONTYPE < data_type, data_type, ... >  -- (Note: Available in Hive 0.7.0 and later)
 
row_format
  : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
        [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
        [NULL DEFINED AS char]   -- (Note: Available in Hive 0.13 and later)
  | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]
 
file_format:
  : SEQUENCEFILE
  | TEXTFILE    -- (Default, depending on hive.default.fileformat configuration)
  | RCFILE      -- (Note: Available in Hive 0.6.0 and later)
  | ORC         -- (Note: Available in Hive 0.11.0 and later)
  | PARQUET     -- (Note: Available in Hive 0.13.0 and later)
  | AVRO        -- (Note: Available in Hive 0.14.0 and later)
  | JSONFILE    -- (Note: Available in Hive 4.0.0 and later)
  | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
 
column_constraint_specification:
  : [ PRIMARY KEY|UNIQUE|NOT NULL|DEFAULT [default_value]|CHECK  [check_expression] ENABLE|DISABLE NOVALIDATE RELY/NORELY ]
 
default_value:
  : [ LITERAL|CURRENT_USER()|CURRENT_DATE()|CURRENT_TIMESTAMP()|NULL ] 
 
constraint_specification:
  : [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE RELY/NORELY ]
    [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE RELY/NORELY ]
    [, CONSTRAINT constraint_name FOREIGN KEY (col_name, ...) REFERENCES table_name(col_name, ...) DISABLE NOVALIDATE 
    [, CONSTRAINT constraint_name UNIQUE (col_name, ...) DISABLE NOVALIDATE RELY/NORELY ]
    [, CONSTRAINT constraint_name CHECK [check_expression] ENABLE|DISABLE NOVALIDATE RELY/NORELY ]

2.hive创建内部表

内部表:
在这里插入图片描述

2.1默认行格式(ROW FORMAT)创建内部表

数据格式:
在这里插入图片描述
基本语法:

create table person1
(
id int,
name string,
hobby array<string>,
address map<string,string>
);
或
(不常用)
create table person2
(
id int,
name string,
hobby array<string>,
address map<string,string>
)
ROW FORMAT delimited
FIELDS TERMINATED BY '\001'
COLLECTION ITEMS TERMINATED BY '\002'
MAP KEYS TERMINATED BY '\003';



//用文件的方式插入数据
load data local inpath '/date/date' into table person2;

在这里插入图片描述

注意:默认的行格式中,以^ A(linux文本书写中为Ctrl+V,Ctrl+A)----------^ H(Ctrl+V,Ctrl+H),不能超过^ H。其中^ A 也与’\001’等效,^ B与’\002’等效,以此类推。

什么时候需要用到^ A等行格式?

当表的属性比较复杂时,如array<map<string,array>>时,自定义分隔符将很难进行界定各属性,必须用此界定符。

为什么用文件方式插入数据,而不是insert into的方式?
因为insert into的方式插入数据需要提交job作业给mr,mr进行map,shuffle,reduce。其中shuffle阶段,频繁的与I/O(网络I/O,磁盘I/O)交互,造成耗费大量时间。

2.2自定义行格式(ROW FORMAT)创建内部表

数据格式:
1,小明1,reading-basketball-football,shanghai:pudong-hubei:wuhan
2,小明2,reading-basketball-football,shanghai:pudong-hubei:wuhan
3,小明3,reading-basketball-football,shanghai:pudong-hubei:wuhan
4,小明4,reading-basketball-football,shanghai:pudong-hubei:wuhan
5,小明5,reading-basketball-football,shanghai:pudong-hubei:wuhan
6,小明6,reading-basketball-football,shanghai:pudong-hubei:wuhan
7,小明7,reading-basketball-football,shanghai:pudong-hubei:wuhan
8,小明8,reading-basketball-football,shanghai:pudong-hubei:wuhan
9,小明9,reading-basketball-football,shanghai:pudong-hubei:wuhan

语法:

create table person
(
id int,
name string,
hobby array<string>,
address map<string,string>
)
ROW FORMAT delimited
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '-'
MAP KEYS TERMINATED BY ':';


//以文件方式插入数据
load data local inpath '/date/date' into table person;

在这里插入图片描述

3.创建外部表

在这里插入图片描述
创建外部表的语法与内部表的语法大致相同,只有个别差异。

数据格式:
1,小明1,reading-basketball-football,shanghai:pudong-hubei:wuhan
2,小明2,reading-basketball-football,shanghai:pudong-hubei:wuhan
3,小明3,reading-basketball-football,shanghai:pudong-hubei:wuhan
4,小明4,reading-basketball-football,shanghai:pudong-hubei:wuhan
语法:

create EXTERNAL table person3
(
id int,
name string,
hobby array<string>,
address map<string,string>
)
ROW FORMAT delimited
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '-'
MAP KEYS TERMINATED BY ':'
//设置数据存储在hdfs的路径
LOCATION '/test';

在这里插入图片描述
注意开头和末尾的差异。并且外部表不需要导入数据,会根据你设置的hdfs的路径下的文件为数据,创建表

4.内部表和外部表的区别

1、创建表的时候,内部表直接存储再默认的hdfs路径,外部表需要自己指定路径
2、删除表的时候,内部表会将数据和元数据全部删除,外部表只删除元数据,数据不删除

删除内部表person和外部表person3
在这里插入图片描述mysql中存档元数据都被删除
在这里插入图片描述在这里插入图片描述外部表的数据依然存在
在这里插入图片描述内部表的数据被删除
在这里插入图片描述

注 意:hive:读时检查(实现解耦(hdfs和hive职责分离,hdfs负责存储,hive负责解析显示。任何文件都可以存入hdfs,但是hive只能解析特定格式的文件,不符合格式,将显示为NULL),提高数据记载的效率)
关系型数据库:写时检查

5.创建struct结构体类型(扩展)

hive> create table t (id structid1:int,id2:int,id3:int,name array,xx map<int,string>)
row format delimited
fields terminated by ‘\t’
lines terminated by ‘\n’
collection items terminated by ‘,’
map keys terminated by ‘:’;
FAILED: ParseException line 5:0 missing EOF at ‘collection’ near ‘’\n’’

for example:

编辑本地文件:f.txt 内容为:1,2 1,2,3,4,5 1:value_1,2:value_2

导入本地文件:load data local inpath ‘/f.txt’ into table t;

查询表数据:select * from t;

{“id1”:1,“id2”:2} [“1”,“2”,“3”,“4”,“5”] {1:“value_1”,2:“value_2”}

发布了19 篇原创文章 · 获赞 1 · 访问量 327

猜你喜欢

转载自blog.csdn.net/qq_43719634/article/details/102470273