41. Introduction and detailed examples of Flink's Hive dialect

Flink series of articles

1. Links to a series of comprehensive articles such as Flink deployment, concept introduction, source, transformation, sink usage examples, introduction and examples of the four cornerstones

13. Basic concepts of Flink's table api and sql, introduction to general api and getting started examples 14. Data types of Flink's table api and sql
: built-in data types and their attributes
Detailed introduction to dynamic tables, time attribute configuration (how to process update results), temporal tables, joins on streams, determinism on streams, and query configuration 16. The connection between Flink’s table api and sql external systems: reading and writing
external System connectors and formats and FileSystem examples (1)
16. Connection between Flink’s table api and sql to external systems: Connectors and formats for reading and writing external systems and Elasticsearch examples (2)
16. Connection between Flink’s table api and sql External systems: Read and write connectors and formats of external systems and Apache Kafka examples (3)
16. Connection between Flink’s table api and sql External systems: Read and write connectors and formats of external systems and JDBC examples (4)

16. Flink's table api and sql connection to external systems: connectors and formats for reading and writing external systems and examples of Apache Hive (6)

20. SQL Client of Flink SQL: You can try Flink SQL without writing code, and you can directly submit SQL tasks to the cluster

22. Flink's table api and sql create table DDL
24. Flink's table api and sql catalogs

30. The SQL client of Flink SQL (introduced the use of configuration files-tables, views, etc. through the examples of kafka and filesystem)

33. Flink's hive introduction and simple examples
41. Flink's Hive dialect introduction and detailed examples
42. Flink's table api and sql's Hive Catalog



This article introduces specific examples of using hive dialect in flink sql. Each example has been run, and some examples retain error information in order to reflect the role of key commands.
The dependent environment of this article is hadoop, zookeeper, hive, and flink environment are easy to use. The specific example is running in version 1.13 (because the hadoop cluster environment is based on jdk8, and flink1.17 version requires jdk11).
For more details, see the subsequent introduction to hive. If your environment cannot run the examples in this article, please refer to: 42. Hive Catalog of Flink’s table api and sql

1. Introduction to Hive dialect

Starting from 1.11.0, when using the Hive dialect, Flink allows users to write SQL statements using Hive syntax. By providing compatibility with Hive syntax, we aim to improve interoperability with Hive and reduce the situation where users need to switch between Flink and Hive to execute different statements.

1. Use Hive dialect

Flink currently supports two SQL dialects: default and hive. You need to switch to the Hive dialect first before you can write using Hive syntax. Here's how to set up a dialect using the SQL client and Table API. Also note that you can dynamically switch dialects for each statement executed. You can use other dialects without restarting the session.

1), SQL client

The SQL dialect can be specified through the table.sql-dialect property. So you can set the initial dialect through the configuration section in the SQL client yaml file.

execution:
  planner: blink
  type: batch
  result-mode: table

configuration:
  table.sql-dialect: hive

Or set the dialect after the SQL client is started.

----使用hive方言
Flink SQL> set table.sql-dialect=hive;
[INFO] Session property has been set.

---使用默认的方言
Flink SQL> set table.sql-dialect=default;
Hive Session ID = 90f6200f-2af7-4045-93fc-9a1fbe77fcfd
[INFO] Session property has been set.

2)、Table API

You can set the dialect for TableEnvironment using Table API.

EnvironmentSettings settings = EnvironmentSettings.newInstance().useBlinkPlanner()...build();
TableEnvironment tableEnv = TableEnvironment.create(settings);
// to use hive dialect
tableEnv.getConfig().setSqlDialect(SqlDialect.HIVE);
// to use default dialect
tableEnv.getConfig().setSqlDialect(SqlDialect.DEFAULT);

  • Example
import java.util.HashMap;
import java.util.Map;

import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.SqlDialect;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
import org.apache.flink.table.catalog.CatalogDatabase;
import org.apache.flink.table.catalog.CatalogDatabaseImpl;
import org.apache.flink.table.catalog.ObjectPath;
import org.apache.flink.table.catalog.exceptions.CatalogException;
import org.apache.flink.table.catalog.exceptions.DatabaseAlreadyExistException;
import org.apache.flink.table.catalog.hive.HiveCatalog;
import org.apache.flink.types.Row;

/**
 * @author alanchan
 *
 */
public class TestCreateHiveTable {
    
    
	public static final String tableName = "alan_hivecatalog_hivedb_testTable";
	public static final String hive_create_table_sql = "CREATE  TABLE  " + tableName +  " (\n" + 
																					  "  id INT,\n" + 
																					  "  name STRING,\n" + 
																					  "  age INT" + ") " + 
																					  "TBLPROPERTIES (\n" + 
																					  "  'sink.partition-commit.delay'='5 s',\n" + 
																					  "  'sink.partition-commit.trigger'='partition-time',\n" + 
																					  "  'sink.partition-commit.policy.kind'='metastore,success-file'" + ")";

	/**
	 * @param args
	 * @throws DatabaseAlreadyExistException
	 * @throws CatalogException
	 */
	public static void main(String[] args) throws Exception {
    
    
		StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
		StreamTableEnvironment tenv = StreamTableEnvironment.create(env);
		String hiveConfDir = "/usr/local/bigdata/apache-hive-3.1.2-bin/conf";
		String name = "alan_hive";
		// default 数据库名称
		String defaultDatabase = "default";

		HiveCatalog hiveCatalog = new HiveCatalog(name, defaultDatabase, hiveConfDir);
		tenv.registerCatalog("alan_hive", hiveCatalog);
		tenv.useCatalog("alan_hive");

		String newDatabaseName = "alan_hivecatalog_hivedb";
		tenv.useDatabase(newDatabaseName);

		// 创建表
		tenv.getConfig().setSqlDialect(SqlDialect.HIVE);
		tenv.executeSql(hive_create_table_sql);

		// 插入数据
		String insertSQL = "insert into alan_hivecatalog_hivedb_testTable values (1,'alan',18)";
		tenv.executeSql(insertSQL);

		// 查询数据
		String selectSQL = "select * from alan_hivecatalog_hivedb_testTable" ;
		Table table = tenv.sqlQuery(selectSQL);
		table.printSchema();
		DataStream<Tuple2<Boolean, Row>> result = tenv.toRetractStream(table, Row.class);
		result.print();
		env.execute();
	}

}

2、DDL

This section lists the DDL statements supported by the Hive dialect. We mainly focus on grammar. For specific hive syntax, please refer to hive related documents. For example, my column about hive:
3. Detailed explanation of hive usage examples - table creation, detailed explanation of data types, internal and external tables, partition tables, bucket tables

1)、show

SHOW CATALOGS
SHOW CURRENT CATALOG
SHOW DATABASES
SHOW CURRENT DATABASE
SHOW TABLES
SHOW VIEWS
SHOW FUNCTIONS
SHOW MODULES
SHOW FULL MODULES

2)、catalog

See the following for details on the operation of hivecatalog:
16. Flink’s table api and sql connection to external systems: connectors and formats for reading and writing external systems and Apache Hive examples (6)
42. Flink’s table api and sql Hive Catalog

----创建
CREATE CATALOG alan_hivecatalog WITH (
    'type' = 'hive',
    'default-database' = 'testhive',
    'hive-conf-dir' = '/usr/local/bigdata/apache-hive-3.1.2-bin/conf'
);
---使用
use alan_hivecatalog ;

2)、database

----创建
CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name
  [COMMENT database_comment]
  [LOCATION fs_path]
  [WITH DBPROPERTIES (property_name=property_value, ...)];
--------修改
ALTER (DATABASE|SCHEMA) database_name SET DBPROPERTIES (property_name=property_value, ...);
ALTER (DATABASE|SCHEMA) database_name SET OWNER [USER|ROLE] user_or_role;
ALTER (DATABASE|SCHEMA) database_name SET LOCATION fs_path;
----删除
DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];
------使用
USE database_name;
  • Example
----------------------sql语句
CREATE DATABASE IF NOT EXISTS alan_testdatabase
comment "This is a database comment"
with dbproperties ('createdBy'='alan');
---hive执行,flink sql不支持语法
DESCRIBE DATABASE EXTENDED alan_testdatabase;

--更改数据库属性
ALTER DATABASE alan_testdatabase SET DBPROPERTIES ('createdBy'='alanchan','createddate'='2023-08-31');
--更改数据库所有者
ALTER DATABASE alan_testdatabase SET OWNER USER alanchan;
--更改数据库位置
ALTER (DATABASE|SCHEMA) database_name SET LOCATION hdfs_path;
-----------

-------创建数据库
Flink SQL> use catalog alan_hivecatalog;
Hive Session ID = 094c27ff-fb61-485c-b4ab-3119881224c9
[INFO] Execute statement succeed.

Flink SQL> CREATE DATABASE IF NOT EXISTS alan_testdatabase
> comment "This is a database comment"
> with dbproperties ('createdBy'='alan');
Hive Session ID = e406eed8-6575-4c0e-8b1b-b99db288a018
[INFO] Execute statement succeed.

Flink SQL> show databases;
Hive Session ID = f37fa06e-3a98-4c63-a304-d431d03c6bfa
+-------------------------+
|           database name |
+-------------------------+
| alan_hivecatalog_hivedb |
|       alan_testdatabase |
|                 default |
|                    test |
|                testhive |
+-------------------------+
5 rows in set

Flink SQL> DESCRIBE DATABASE EXTENDED alan_testdatabase;
Hive Session ID = 94218191-bcce-4722-8a05-2f11e6d6807b
[ERROR] Could not execute SQL statement. Reason:
org.apache.flink.table.client.gateway.SqlExecutionException: Failed to parse statement: DESCRIBE DATABASE EXTENDED alan_testdatabase
----------------修改数据库属性
Flink SQL> ALTER DATABASE alan_testdatabase SET DBPROPERTIES ('createdBy'='alanchan','createddate'='2023-08-31');
Hive Session ID = 6a598d85-8468-4f66-969f-8e4e1fa1981d
[INFO] Execute statement succeed.

Flink SQL> ALTER DATABASE alan_testdatabase SET OWNER USER 'alanchan';
Hive Session ID = e9ce6a7a-6d89-48f8-81c9-5a303706c0f9
[ERROR] Could not execute SQL statement. Reason:
org.apache.flink.table.planner.delegation.hive.copy.HiveASTParseException: line 1:48 cannot recognize input near ''alanchan'' '<EOF>' '<EOF>' in identifier for principal spec

Flink SQL> ALTER DATABASE alan_testdatabase SET OWNER USER alanchan;
Hive Session ID = 7975a16d-cd2b-4be5-9a1e-90b1454860ed
[INFO] Execute statement succeed.

------------------------hive中查询数据库属性
0: jdbc:hive2://server4:10000> DESCRIBE DATABASE EXTENDED alan_testdatabase;
+--------------------+-----------------------------+-------------------+-------------+-------------+-------------------+
|      db_name       |           comment           |     location      | owner_name  | owner_type  |    parameters     |
+--------------------+-----------------------------+-------------------+-------------+-------------+-------------------+
| alan_testdatabase  | This is a database comment  | location/in/test  |             | USER        | {createdBy=alan}  |
+--------------------+-----------------------------+-------------------+-------------+-------------+-------------------+
1 row selected (0.048 seconds)
0: jdbc:hive2://server4:10000> DESCRIBE DATABASE EXTENDED alan_testdatabase;
+--------------------+-----------------------------+-------------------+-------------+-------------+-----------------------------------------------+
|      db_name       |           comment           |     location      | owner_name  | owner_type  |                  parameters                   |
+--------------------+-----------------------------+-------------------+-------------+-------------+-----------------------------------------------+
| alan_testdatabase  | This is a database comment  | location/in/test  |             | USER        | {createdBy=alanchan, createddate=2023-08-31}  |
+--------------------+-----------------------------+-------------------+-------------+-------------+-----------------------------------------------+
1 row selected (0.049 seconds)
0: jdbc:hive2://server4:10000> DESCRIBE DATABASE EXTENDED alan_testdatabase;
+--------------------+-----------------------------+-------------------+-------------+-------------+-----------------------------------------------+
|      db_name       |           comment           |     location      | owner_name  | owner_type  |                  parameters                   |
+--------------------+-----------------------------+-------------------+-------------+-------------+-----------------------------------------------+
| alan_testdatabase  | This is a database comment  | location/in/test  | alanchan    | USER        | {createdBy=alanchan, createddate=2023-08-31}  |
+--------------------+-----------------------------+-------------------+-------------+-------------+-----------------------------------------------+
1 row selected (0.047 seconds)
---------flink drop 数据库
Flink SQL> drop database alan_testdatabase;
Hive Session ID = 30c71d25-a8fa-4b84-99b9-b6441937b6cf
[INFO] Execute statement succeed.

3)、table

------创建
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
  [(col_name data_type [column_constraint] [COMMENT col_comment], ... [table_constraint])]
  [COMMENT table_comment]
  [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
  [
    [ROW FORMAT row_format]
    [STORED AS file_format]
  ]
  [LOCATION fs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]

row_format:
  : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
      [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
      [NULL DEFINED AS char]
  | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, ...)]

file_format:
  : SEQUENCEFILE
  | TEXTFILE
  | RCFILE
  | ORC
  | PARQUET
  | AVRO
  | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname

column_constraint:
  : NOT NULL [[ENABLE|DISABLE] [VALIDATE|NOVALIDATE] [RELY|NORELY]]

table_constraint:
  : [CONSTRAINT constraint_name] PRIMARY KEY (col_name, ...) [[ENABLE|DISABLE] [VALIDATE|NOVALIDATE] [RELY|NORELY]]
  
--------修改
ALTER TABLE table_name RENAME TO new_table_name;
ALTER TABLE table_name SET TBLPROPERTIES (property_name = property_value, property_name = property_value, ... );
ALTER TABLE table_name [PARTITION partition_spec] SET LOCATION fs_path;
--如果指定了 partition_spec,那么必须完整,即具有所有分区列的值。如果指定了,该操作将作用在对应分区上而不是表上。
ALTER TABLE table_name [PARTITION partition_spec] SET FILEFORMAT file_format;
--如果指定了 partition_spec,那么必须完整,即具有所有分区列的值。如果指定了,该操作将作用在对应分区上而不是表上。

Update SerDe Properties # 
ALTER TABLE table_name [PARTITION partition_spec] SET SERDE serde_class_name [WITH SERDEPROPERTIES serde_properties];

ALTER TABLE table_name [PARTITION partition_spec] SET SERDEPROPERTIES serde_properties;

serde_properties:
  : (property_name = property_value, property_name = property_value, ... )
--如果指定了 partition_spec,那么必须完整,即具有所有分区列的值。如果指定了,该操作将作用在对应分区上而不是表上。

Add Partitions # 
ALTER TABLE table_name ADD [IF NOT EXISTS] (PARTITION partition_spec [LOCATION fs_path])+;
Drop Partitions # 
ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec[, PARTITION partition_spec, ...];
Add/Replace Columns # 
ALTER TABLE table_name
  ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...)
  [CASCADE|RESTRICT]
Change Column # 
ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name column_type
  [COMMENT col_comment] [FIRST|AFTER column_name] [CASCADE|RESTRICT];
Drop #
DROP TABLE [IF EXISTS] table_name;

 --------删除
 drop table tableName;

  • Basic table operation example
Flink SQL> create table student(
>     num int,
>     name string,
>     sex string,
>     age int,
>     dept string)
> row format delimited
> fields terminated by ',';
Hive Session ID = d3a87e7c-4175-4c07-957a-4e855a537654
[INFO] Execute statement succeed.

Flink SQL> show tables;
Hive Session ID = 820371a4-d23f-4660-a558-905bd6c578e1
+------------+
| table name |
+------------+
|    student |
+------------+
1 row in set

Flink SQL> create external table student_ext(
>     num int,
>     name string,
>     sex string,
>     age int,
>     dept string)
> row format delimited
> fields terminated by ',';
Hive Session ID = 57656797-60f0-420e-8a0e-873ef5356303
[INFO] Execute statement succeed.

Flink SQL> show tables;
Hive Session ID = 060b1ab4-58a5-4377-b86c-4941e2dd672e
+-------------+
|  table name |
+-------------+
|     student |
| student_ext |
+-------------+
2 rows in set

Flink SQL> desc student;
Hive Session ID = 25346927-8a02-4178-ae17-0444d7c2ded4
+------+--------+------+-----+--------+-----------+
| name |   type | null | key | extras | watermark |
+------+--------+------+-----+--------+-----------+
|  num |    INT | true |     |        |           |
| name | STRING | true |     |        |           |
|  sex | STRING | true |     |        |           |
|  age |    INT | true |     |        |           |
| dept | STRING | true |     |        |           |
+------+--------+------+-----+--------+-----------+
5 rows in set

Flink SQL> desc student_ext;
Hive Session ID = 4858109e-8e2c-49e0-9319-37dc347de73a
+------+--------+------+-----+--------+-----------+
| name |   type | null | key | extras | watermark |
+------+--------+------+-----+--------+-----------+
|  num |    INT | true |     |        |           |
| name | STRING | true |     |        |           |
|  sex | STRING | true |     |        |           |
|  age |    INT | true |     |        |           |
| dept | STRING | true |     |        |           |
+------+--------+------+-----+--------+-----------+
5 rows in set
Flink SQL> show tables;
Hive Session ID = a6bc1062-de92-4aa3-abda-587124663002
+-------------+
|  table name |
+-------------+
|     student |
|    student1 |
| student_ext |
+-------------+
3 rows in set

-----------删除表
Flink SQL> drop table student;
Hive Session ID = e270ce9f-3bd1-4241-9ad4-a8e42405645b
[INFO] Execute statement succeed.

-----------修改表
--1、更改表名
ALTER TABLE student RENAME TO alan_student;
Flink SQL> ALTER TABLE student RENAME TO alan_student;
Hive Session ID = f0659cc5-e578-403e-8a9e-0989a64652dc
[INFO] Execute statement succeed.

Flink SQL> show tables;
Hive Session ID = 6a1fdf70-43d5-40bd-86f7-4430b2d30e83
+--------------+
|   table name |
+--------------+
| alan_student |
|  student_ext |
+--------------+
2 rows in set

--2、更改表属性
ALTER TABLE table_name SET TBLPROPERTIES (property_name = property_value, ... );
--更改表注释
ALTER TABLE alan_student SET TBLPROPERTIES ('comment' = "new comment for alan_student table");
Flink SQL> ALTER TABLE alan_student SET TBLPROPERTIES ('comment' = "new comment for alan_student table");
Hive Session ID = c3a1f8ee-11fd-4e33-94bd-e941e2928c9d
[INFO] Execute statement succeed.
0: jdbc:hive2://server4:10000> DESC FORMATTED alan_student;
+-------------------------------+----------------------------------------------------+-------------------------------------+
|           col_name            |                     data_type                      |               comment               |
+-------------------------------+----------------------------------------------------+-------------------------------------+
| # col_name                    | data_type                                          | comment                             |
| num                           | int                                                |                                     |
| name                          | string                                             |                                     |
| sex                           | string                                             |                                     |
| age                           | int                                                |                                     |
| dept                          | string                                             |                                     |
|                               | NULL                                               | NULL                                |
| # Detailed Table Information  | NULL                                               | NULL                                |
| Database:                     | alan_testdatabase                                  | NULL                                |
| OwnerType:                    | USER                                               | NULL                                |
| Owner:                        | null                                               | NULL                                |
| CreateTime:                   | Thu Aug 31 14:02:55 CST 2023                       | NULL                                |
| LastAccessTime:               | UNKNOWN                                            | NULL                                |
| Retention:                    | 0                                                  | NULL                                |
| Location:                     | hdfs://HadoopHAcluster/user/hive/warehouse/alan_testdatabase.db/alan_student | NULL                                |
| Table Type:                   | MANAGED_TABLE                                      | NULL                                |
| Table Parameters:             | NULL                                               | NULL                                |
|                               | bucketing_version                                  | 2                                   |
|                               | comment                                            | new comment for alan_student table  |
|                               | numFiles                                           | 0                                   |
|                               | totalSize                                          | 0                                   |
|                               | transient_lastDdlTime                              | 1693461775                          |
|                               | NULL                                               | NULL                                |
| # Storage Information         | NULL                                               | NULL                                |
| SerDe Library:                | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL                                |
| InputFormat:                  | org.apache.hadoop.mapred.TextInputFormat           | NULL                                |
| OutputFormat:                 | org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat | NULL                                |
| Compressed:                   | No                                                 | NULL                                |
| Num Buckets:                  | -1                                                 | NULL                                |
| Bucket Columns:               | []                                                 | NULL                                |
| Sort Columns:                 | []                                                 | NULL                                |
| Storage Desc Params:          | NULL                                               | NULL                                |
|                               | field.delim                                        | ,                                   |
|                               | serialization.format                               | ,                                   |
+-------------------------------+----------------------------------------------------+-------------------------------------+
34 rows selected (0.069 seconds)

--3、更改列名称/类型/位置/注释
CREATE TABLE test_change (a int, b int, c int);
Flink SQL> CREATE TABLE test_change (a int, b int, c int);

// 更改列名称
ALTER TABLE test_change CHANGE a a1 INT;
Flink SQL> ALTER TABLE test_change CHANGE a a1 INT;
Flink SQL> desc test_change;
+------+------+------+-----+--------+-----------+
| name | type | null | key | extras | watermark |
+------+------+------+-----+--------+-----------+
|   a1 |  INT | true |     |        |           |
|    b |  INT | true |     |        |           |
|    c |  INT | true |     |        |           |
+------+------+------+-----+--------+-----------+

// 更改列名称和类型
ALTER TABLE test_change CHANGE a1 a2 STRING AFTER b;
Flink SQL> ALTER TABLE test_change CHANGE a1 a2 STRING AFTER b;
Flink SQL> desc test_change;
+------+--------+------+-----+--------+-----------+
| name |   type | null | key | extras | watermark |
+------+--------+------+-----+--------+-----------+
|    b |    INT | true |     |        |           |
|   a2 | STRING | true |     |        |           |
|    c |    INT | true |     |        |           |
+------+--------+------+-----+--------+-----------+
--4、添加/替换列
--使用ADD COLUMNS,可以将新列添加到现有列的末尾但在分区列之前。
--REPLACE COLUMNS 将删除所有现有列,并添加新的列集。
ALTER TABLE table_name ADD|REPLACE COLUMNS (col_name data_type,...);
Flink SQL> desc  alan_student;
+------+--------+------+-----+--------+-----------+
| name |   type | null | key | extras | watermark |
+------+--------+------+-----+--------+-----------+
|  num |    INT | true |     |        |           |
| name | STRING | true |     |        |           |
|  sex | STRING | true |     |        |           |
|  age |    INT | true |     |        |           |
| dept | STRING | true |     |        |           |
+------+--------+------+-----+--------+-----------+
Flink SQL> ALTER TABLE alan_student ADD COLUMNS (balance int);
Flink SQL> desc  alan_student;
+---------+--------+------+-----+--------+-----------+
|    name |   type | null | key | extras | watermark |
+---------+--------+------+-----+--------+-----------+
|     num |    INT | true |     |        |           |
|    name | STRING | true |     |        |           |
|     sex | STRING | true |     |        |           |
|     age |    INT | true |     |        |           |
|    dept | STRING | true |     |        |           |
| balance |    INT | true |     |        |           |
+---------+--------+------+-----+--------+-----------+

Flink SQL> ALTER TABLE alan_student REPLACE COLUMNS (age int);
Flink SQL> desc  alan_student;
+------+------+------+-----+--------+-----------+
| name | type | null | key | extras | watermark |
+------+------+------+-----+--------+-----------+
|  age |  INT | true |     |        |           |
+------+------+------+-----+--------+-----------+

  • Partition table operation example
--1、增加分区
创建一个单分区表
create table user_dept (
    num int,
    name string,
    sex string,
    age int) 
partitioned by (dept string) 
row format delimited fields terminated by ',';
--加载数据
load data inpath '/hivetest/partition/students_MA.txt' into table user_dept partition(dept ="MA");

-- 一次添加一个分区
ALTER TABLE user_dept ADD PARTITION (dept='IS') '/user/hive/warehouse/testhive.db/user_dept/dept=IS';
--加载数据
load data inpath '/hivetest/partition/students_IS.txt' into table user_dept partition(dept ="IS");

-- 添加多级分区
ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec [LOCATION 'location'][, PARTITION partition_spec [LOCATION 'location'], ...];
partition_spec:
  : (partition_column = partition_col_value, partition_column = partition_col_value, ...)
  
--创建一个二级分区表
create table user_dept_sex (id int, name string,age int) 
partitioned by (dept string, sex string)
row format delimited fields terminated by ",";
--增加多分区
ALTER TABLE user_dept_sex ADD PARTITION (dept='MA', sex='M') 
PARTITION (dept='MA', sex='F') 
PARTITION (dept='IS', sex='M') 
PARTITION (dept='IS', sex='F') ;
--加载数据
load data inpath '/hivetest/partition/user_dept/ma/m' into table user_dept_sex partition(dept='MA', sex='M');
load data inpath '/hivetest/partition/user_dept/ma/f' into table user_dept_sex partition(dept='MA', sex='F');
load data inpath '/hivetest/partition/user_dept/is/m' into table user_dept_sex partition(dept='IS', sex='M');
load data inpath '/hivetest/partition/user_dept/is/f' into table user_dept_sex partition(dept='IS', sex='F');

--2、rename partition
--2、重命名分区
ALTER TABLE table_name PARTITION partition_spec RENAME TO PARTITION partition_spec;

ALTER TABLE user_dept_sex PARTITION (dept='MA', sex='M') RENAME TO PARTITION (dept='MA', sex='Male');
ALTER TABLE user_dept_sex PARTITION (dept='MA', sex='F') RENAME TO PARTITION (dept='MA', sex='Female');
ALTER TABLE user_dept_sex PARTITION (dept='IS', sex='M') RENAME TO PARTITION (dept='IS', sex='Male');
ALTER TABLE user_dept_sex PARTITION (dept='IS', sex='F') RENAME TO PARTITION (dept='IS', sex='Female');

--3、删除分区
delete partition
--删除表的分区。这将删除该分区的数据和元数据。
ALTER TABLE table_name DROP [IF EXISTS] PARTITION (dt='2008-08-08', country='us');
ALTER TABLE table_name DROP [IF EXISTS] PARTITION (dt='2008-08-08', country='us') PURGE; --直接删除数据 不进垃圾桶

--5、修改分区
alter partition
--更改分区文件存储格式
ALTER TABLE table_name PARTITION (dt='2008-08-09') SET FILEFORMAT file_format;
--更改分区位置
ALTER TABLE table_name PARTITION (dt='2008-08-09') SET LOCATION "new location";

4)、VIEW

--创建
CREATE VIEW [IF NOT EXISTS] view_name [(column_name, ...) ]
  [COMMENT view_comment]
  [TBLPROPERTIES (property_name = property_value, ...)]
  AS SELECT ...;
--注意: 变更视图只在 Table API 中有效,SQL 客户端不支持。
--修改
ALTER VIEW view_name RENAME TO new_view_name;
ALTER VIEW view_name SET TBLPROPERTIES (property_name = property_value, ... );
ALTER VIEW view_name AS select_statement;
--删除
DROP VIEW [IF EXISTS] view_name;
  • Example
--hive中有一张真实的基础表dim_user
select * from test.dim_user;

--1、创建视图
create view dim_user_v as select * from dim_user limit 4;

--从已有的视图中创建视图
create view dim_user_from_v as select * from dim_user_v limit 2;

--2、显示当前已有的视图
show tables;
show views;--hive v2.2.0之后支持

--3、视图的查询使用
select * from dim_user_v ;
select * from dim_user_from_v ;

--4、查看视图定义
show create table dim_user_v ;

--5、删除视图
drop view dim_user_from_v;

--6、更改视图属性
alter view dim_user_v set TBLPROPERTIES ('comment' = 'This is a view');

--7、更改视图定义
alter view dim_user_v as  select name from dim_user limit 2;
-----------------------flink sql cli 操作记录
Flink SQL> select * from dim_user;

+----+--------------------------------+--------------------------------+
| op |                             id |                           name |
+----+--------------------------------+--------------------------------+
| +I |                              4 |                           赵六 |
| +I |                              5 |                           alan |
| +I |                              1 |                           张三 |
| +I |                              2 |                           李四 |
| +I |                              3 |                           王五 |
+----+--------------------------------+--------------------------------+
Received a total of 5 rows

Flink SQL> create view dim_user_v as select * from dim_user limit 4;

Flink SQL> create view dim_user_from_v as select * from dim_user_v limit 2;

Flink SQL> show views;
Hive Session ID = ebae650c-ae23-4262-b9da-8ccef16d1b91
+-----------------+
|       view name |
+-----------------+
| dim_user_from_v |
|      dim_user_v |
+-----------------+
2 rows in set

Flink SQL> select * from dim_user_v ;

+----+--------------------------------+--------------------------------+
| op |                             id |                           name |
+----+--------------------------------+--------------------------------+
| +I |                              1 |                           张三 |
| +I |                              2 |                           李四 |
| +I |                              3 |                           王五 |
| +I |                              4 |                           赵六 |
+----+--------------------------------+--------------------------------+
Received a total of 4 rows

Flink SQL> show create table dim_user_v ;
Hive Session ID = 62802230-2420-484c-9a41-a66d45be3b3c
[ERROR] Could not execute SQL statement. Reason:
org.apache.flink.table.client.gateway.SqlExecutionException: Failed to parse statement: show create table dim_user_v

Flink SQL> select * from dim_user_from_v ;
+----+--------------------------------+--------------------------------+
| op |                             id |                           name |
+----+--------------------------------+--------------------------------+
| +I |                              1 |                           张三 |
| +I |                              2 |                           李四 |
+----+--------------------------------+--------------------------------+
Received a total of 2 rows

Flink SQL> drop view dim_user_from_v;

Flink SQL> alter view dim_user_v set TBLPROPERTIES ('comment' = 'This is a view');

Flink SQL> alter view dim_user_v as  select name from dim_user limit 2;

Flink SQL> show views;
+------------+
|  view name |
+------------+
| dim_user_v |
+------------+
1 row in set

Flink SQL> select * from dim_user_v;
+----+--------------------------------+
| op |                           name |
+----+--------------------------------+
| +I |                           张三 |
| +I |                           李四 |
+----+--------------------------------+
Received a total of 2 rows

5)、FUNCTION

--创建
CREATE FUNCTION function_name AS class_name;
--删除
DROP FUNCTION [IF EXISTS] function_name;
Flink SQL> select current_date();
+----+------------+
| op |     _o__c0 |
+----+------------+
| +I | 2023-08-31 |
+----+------------+
Received a total of 1 row

Flink SQL> select floor(3.1415926);
+----+------------+
| op |     _o__c0 |
+----+------------+
| +I |          3 |
+----+------------+
Received a total of 1 row

3、DML & DQL

The Hive dialect supports commonly used Hive DML and DQL. The following table lists some of the syntax supported by Hive dialects.

SORT/CLUSTER/DISTRIBUTE BY
Group By
Join
Union
LATERAL VIEW
Window Functions
SubQueries
CTE
INSERT INTO dest schema
Implicit type conversions

In order to achieve better syntax and semantic compatibility, it is strongly recommended to use HiveModule and put it at the top of the Module list so that Hive built-in functions are used first during function parsing.

The Hive dialect no longer supports Flink SQL syntax. To use Flink syntax, switch to the default dialect.

The following is an example using the Hive dialect.

Note: When hive is running in streaming mode, insert overwrite cannot insert data

CREATE CATALOG alan_hivecatalog WITH (
    'type' = 'hive',
    'default-database' = 'testhive',
    'hive-conf-dir' = '/usr/local/bigdata/apache-hive-3.1.2-bin/conf'
);
use catalog alan_hivecatalog;
set table.sql-dialect=hive;
load module hive;
use modules hive,core;
select explode(array(1,2,3));
create table tbl (key int,value string);
set execution.runtime-mode=streaming; 
insert into table tbl values (5,'e'),(1,'a'),(1,'a'),(3,'c'),(2,'b'),(3,'c'),(3,'c'),(4,'d');
select * from tbl;

--------------------flink sql 操作
Flink SQL> select explode(array(1,2,3));
Hive Session ID = 7d3ae2d5-24f3-4d97-9897-83c8a9abda9b
[ERROR] Could not execute SQL statement. Reason:
org.apache.hadoop.hive.ql.parse.SemanticException: Invalid function explode

Flink SQL> set table.sql-dialect=hive;

Flink SQL> select explode(array(1,2,3));
Hive Session ID = c0b87333-4957-4c18-b197-27649a3f2ae2
[ERROR] Could not execute SQL statement. Reason:
org.apache.hadoop.hive.ql.parse.SemanticException: Invalid function explode

Flink SQL> load module hive;

Flink SQL> use modules hive,core;

Flink SQL> select explode(array(1,2,3));

+----+-------------+
| op |         col |
+----+-------------+
| +I |           1 |
| +I |           2 |
| +I |           3 |
+----+-------------+
Received a total of 3 rows

Flink SQL> create table tbl (key int,value string);

Flink SQL> insert overwrite table tbl values (5,'e'),(1,'a'),(1,'a'),(3,'c'),(2,'b'),(3,'c'),(3,'c'),(4,'d');
Hive Session ID = 12fe08fa-5e63-44b2-8fc3-a90064959451
[INFO] Submitting SQL update statement to the cluster...
[ERROR] Could not execute SQL statement. Reason:
java.lang.IllegalStateException: Streaming mode not support overwrite.

Flink SQL> set execution.runtime-mode=batch; 
Hive Session ID = 4f17cc70-165c-4540-a299-874b66458521
[INFO] Session property has been set.

Flink SQL> insert overwrite table tbl values (5,'e'),(1,'a'),(1,'a'),(3,'c'),(2,'b'),(3,'c'),(3,'c'),(4,'d');
Hive Session ID = 1923623f-03d3-44b4-93ab-ee8498c5da06
[INFO] Submitting SQL update statement to the cluster...
[ERROR] Could not execute SQL statement. Reason:
java.lang.IllegalArgumentException: Checkpoint is not supported for batch jobs.

Flink SQL> set execution.runtime-mode=streaming; 

Flink SQL> insert into table tbl values (5,'e'),(1,'a'),(1,'a'),(3,'c'),(2,'b'),(3,'c'),(3,'c'),(4,'d');

Flink SQL> select * from tbl;
+----+-------------+--------------------------------+
| op |         key |                          value |
+----+-------------+--------------------------------+
| +I |           5 |                              e |
| +I |           1 |                              a |
| +I |           1 |                              a |
| +I |           3 |                              c |
| +I |           2 |                              b |
| +I |           3 |                              c |
| +I |           3 |                              c |
| +I |           4 |                              d |
+----+-------------+--------------------------------+
Received a total of 8 rows

4. Pay attention

Here are some considerations for using Hive dialects.

  • The Hive dialect can only be used to operate Hive objects, and requires the current Catalog to be a HiveCatalog.
  • The Hive dialect only supports two-level identifiers such as db.table, and does not support identifiers with Catalog names.
  • Although all Hive versions support the same syntax, the availability of certain features depends on which version of Hive you are using. For example, updating the database location is only supported in Hive-2.4.0 or higher.
  • HiveModule should be used when executing DML and DQL.

The above describes the specifics of using the hive dialect in flink sql.

Guess you like

Origin blog.csdn.net/chenwewi520feng/article/details/132046495