概述
物化视图是Apache Hive 3.0.0新加入的特性。使用物化视图加速查询速度是传统数据库常用的技术之一。Apache Hive3.0.0中引入的初始实现侧重于引入物化视图和基于这些物化视图的自动查询重写。物化视图可以存储在Hive本地或者其他系统中,如Druid。优化器依赖于Apache Calcite(Apache Hive 3.0 里的物化视图是基于Apache Calcite实现) 自动为包含 projections, filters, join,和aggregation 操作的一组大型查询表达式生成完全和部分的重写。
创建物化视图
CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db_name.]materialized_view_name
[DISABLE REWRITE]
[COMMENT materialized_view_comment]
[PARTITIONED ON (col_name, ...)]
[CLUSTERED ON (col_name, ...) | DISTRIBUTED ON (col_name, ...) SORTED ON (col_name, ...)]
[
[ROW FORMAT row_format]
[STORED AS file_format]
| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]
]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)]
AS SELECT ...;
CREATE MATERIALIZED VIEW语句用以创建物化视图。创建时如果没有提供字段名,将会从SELECT表达式自动衍生。
示例
- 创建2个表并插入数据
> CREATE TABLE emps (
empid INT,
deptno INT,
name VARCHAR(256),
salary FLOAT,
hire_date TIMESTAMP);
> CREATE TABLE depts (
deptno INT,
deptname VARCHAR(256),
locationid INT);
> INSERT INTO TABLE emps VALUES (10001,101,'jane doe',250000,'2018-01-10');
> INSERT INTO TABLE emps VALUES (10002,100,'somporn klailee',210000,'2017-12-25');
> INSERT INTO TABLE emps VALUES (10003,200,'jeiranan thongnopneua',175000,'2018-05-05');
> INSERT INTO TABLE depts VALUES (100,'HR',10);
> INSERT INTO TABLE depts VALUES (101,'Eng',11);
> INSERT INTO TABLE depts VALUES (200,'Sup',20);
- 创建物化视图
> CREATE MATERIALIZED VIEW mv1
AS SELECT empid, deptname, hire_date
FROM emps JOIN depts
ON (emps.deptno = depts.deptno)
WHERE hire_date >= '2017-01-01';
查看物化视图
使用SHOW查看
SHOW MATERIALIZED VIEWS [IN database_name] ['identifier_with_wildcards’];
示例
> SHOW MATERIALIZED VIEWS;
+------------+--------------------+-----------------+
| mv_name | rewrite_enabled | mode |
+------------+--------------------+-----------------+
| # MV Name | Rewriting Enabled | Mode |
| mv1 | Yes | Manual refresh |
| | NULL | NULL |
+------------+--------------------+-----------------+
示例:使用show create table查看
> show create table mv1;
+----------------------------------------------------+
| createtab_stmt |
+----------------------------------------------------+
| CREATE TABLE `mv1`( |
| `empid` int, |
| `deptname` varchar(256), |
| `hire_date` timestamp) |
| ROW FORMAT SERDE |
| 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' |
| STORED AS INPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' |
| OUTPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |
| LOCATION |
| 'hdfs://ns001/tmp/hive/mv1' |
| TBLPROPERTIES ( |
| 'bucketing_version'='2', |
| 'transient_lastDdlTime'='1569728122') |
+----------------------------------------------------+
使用DESCRIBE查看
ESCRIBE [EXTENDED | FORMATTED] [db_name.]materialized_view_name;
示例
> DESC FORMATTED mv1;
+----------------------------------+----------------------------------------------------+-------------------------------------------+
| col_name | data_type | comment |
+----------------------------------+----------------------------------------------------+-------------------------------------------+
| # col_name | data_type | comment |
| empid | int | |
| deptname | varchar(256) | |
| hire_date | timestamp | |
| | NULL | NULL |
| # Detailed Table Information | NULL | NULL |
| Database: | test2 | NULL |
| OwnerType: | USER | NULL |
| Owner: | hadoop | NULL |
| CreateTime: | Sun Sep 29 11:35:22 CST 2019 | NULL |
| LastAccessTime: | UNKNOWN | NULL |
| Retention: | 0 | NULL |
| Location: | hdfs://ns001/tmp/hive/mv1 | NULL |
| Table Type: | MATERIALIZED_VIEW | NULL |
| Table Parameters: | NULL | NULL |
| | COLUMN_STATS_ACCURATE | {\"BASIC_STATS\":\"true\"} |
| | bucketing_version | 2 |
| | numFiles | 3 |
| | numRows | 3 |
| | rawDataSize | 392 |
| | totalSize | 1285 |
| | transient_lastDdlTime | 1569728122 |
| | NULL | NULL |
| # Storage Information | NULL | NULL |
| SerDe Library: | org.apache.hadoop.hive.ql.io.orc.OrcSerde | NULL |
| InputFormat: | org.apache.hadoop.hive.ql.io.orc.OrcInputFormat | NULL |
| OutputFormat: | org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat | NULL |
| Compressed: | No | NULL |
| Num Buckets: | -1 | NULL |
| Bucket Columns: | [] | NULL |
| Sort Columns: | [] | NULL |
| | NULL | NULL |
| # Materialized View Information | NULL | NULL |
| Original Query: | SELECT empid, deptname, hire_date | NULL |
| | | FROM emps JOIN depts |
| | | ON (emps.deptno = depts.deptno) |
| | | WHERE hire_date >= '2017-01-01' |
| Expanded Query: | SELECT `emps`.`empid`, `depts`.`deptname`, `emps`.`hire_date` | NULL |
| | | FROM `test2`.`emps` JOIN `test2`.`depts` |
| | | ON (`emps`.`deptno` = `depts`.`deptno`) |
| | | WHERE `emps`.`hire_date` >= '2017-01-01' |
| Rewrite Enabled: | Yes | NULL |
| Outdated for Rewriting: | Unknown | NULL |
+----------------------------------+----------------------------------------------------+-------------------------------------------+
修改物化视图
物化视图一旦创建,优化器就能够利用其定义语义自动重写传入的查询,从而加快查询的执行速度。创建物化视图的时候,Hive会默认开启(通过参数 hive.materializedview.rewriting 设置)自动重写功能。用户可以使用 ALTER MATERIALIZED VIEW 语句修改是否启用该特性。
ALTER MATERIALIZED VIEW [db_name.]materialized_view_name ENABLE|DISABLE REWRITE;
示例
> ALTER MATERIALIZED VIEW mv1 DISABLE REWRITE;
> SHOW MATERIALIZED VIEWS;
+------------+--------------------+-----------------+
| mv_name | rewrite_enabled | mode |
+------------+--------------------+-----------------+
| # MV Name | Rewriting Enabled | Mode |
| mv1 | No | Manual refresh |
| | NULL | NULL |
+------------+--------------------+-----------------+
删除物化视图
DROP MATERIALIZED VIEW [db_name.]materialized_view_name;
示例
> DROP MATERIALIZED VIEW mv1;
> SHOW MATERIALIZED VIEWS;
+----------+------------------+-------+
| mv_name | rewrite_enabled | mode |
+----------+------------------+-------+
+----------+------------------+-------+
参考
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/Alter%20Materialized%20View
https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.0.1/using-hiveql/content/hive_using_materialized_views.html
https://cwiki.apache.org/confluence/display/Hive/Materialized+views