1 Overview
Flink CDC is a set of source connectors for Apache Flink® that uses Change Data Capture (CDC) to fetch changes from different databases. CDC Connectors for Apache Flink integrates Debezium as an engine for capturing data changes. So it can give full play to the capabilities of Debezium.
2. Supported connectors
Connector | database | drive |
---|---|---|
mongodb-cdc | MongoDB: 3.6, 4.x, 5.0 | MongoDB Driver: 4.3.4 |
mysql-cdc | MySQL: 5.6, 5.7, 8.0.x、RDS MySQL: 5.6, 5.7, 8.0.x、PolarDB MySQL: 5.6, 5.7, 8.0.x、Aurora MySQL: 5.6, 5.7, 8.0.x、MariaDB: 10.x、PolarDB X: 2.0.1 | JDBC Driver: 8.0.28 |
oceanbase-cdc | OceanBase CE: 3.1.x, 4.x、OceanBase EE: 2.x, 3.x, 4.x | OceanBase Driver: 2.4.x |
oracle-cdc | Oracle: 11, 12, 19, 21 | Oracle Driver: 19.3.0.0 |
postgres-cdc | PostgreSQL: 9.6, 10, 11, 12, 13, 14 | JDBC Driver: 42.5.1 |
sqlserver-cdc | Sqlserver: 2012, 2014, 2016, 2017, 2019 | JDBC Driver: 9.4.1.jre8 |
tidb-cdc | TiDB: 5.1.x, 5.2.x, 5.3.x, 5.4.x, 6.0.0 | JDBC Driver: 8.0.27 |
db2-cdc | Db2: 11.5 | Db2 Driver: 11.5.0.0 |
vitess-cdc | Speed: 8.0.x, 9.0.x | MySql JDBC Driver: 8.0.26 |
3. Supported Flink versions
The following table shows the version correspondence between Flink CDC Connectors and Flink®:
Flink CDC version_ | Flink version_ |
---|---|
1.0.0 | 1.11.* |
1.1.0 | 1.11.* |
1.2.0 | 1.12.* |
1.3.0 | 1.12.* |
1.4.0 | 1.13.* |
2.0.* | 1.13.* |
2.1.* | 1.13.* |
2.2.* | 1.13.*、1.14.* |
2.3.* | 1.13.*、1.14.*、1.15.*、1.16.0 |
2.4.* | 1.13.*、1.14*、1.15.*、1.16.*、1.17.0 |
4. Features
Supports reading database snapshots, and can continue to read binlog even if a failure occurs, and perform Exactly-once processing.
CDC connector for DataStream API, users can use multiple database and table changes in a single job without deploying Debezium and Kafka.
CDC connector for Table/SQL API, users can use SQL DDL to create CDC feeds to monitor changes on a single table.
5. Table/SQL API usage
We need a few steps to setup a Flink cluster using the provided connectors.
First we installed the 1.17+ version of the Flink cluster (java 8+).
Note : If you need to install Flink, please refer to the author's corresponding blog flink high-availability cluster construction (Standalone mode).
The jar packages used in this article are flink-connector-jdbc-3.1.1-1.17.jar and flink-sql-connector-mysql-cdc -2.2.1.jar
Download the connector SQL jar (or build it yourself ).
Put the downloaded jar package in FLINK_HOME/lib/.
Restart the Flink cluster.
Note : At present, versions above 2.4 need to be compiled and built by themselves. The author of this article builds and uploads by himself
6. Use Flink CDC to perform streaming ETL on MySQL
This tutorial will show how to use Flink CDC to quickly build a streaming ETL for MySQL.
Suppose we store product data in MySQL and synchronize it to another MySQL
In the following sections, we will introduce how to use Flink Mysql CDC to achieve it. All exercises in this tutorial are performed in Flink SQL CLI, and the whole process uses standard SQL syntax without any Java/Scala code or IDE installation.
The architecture is outlined as follows:
7. Environment preparation
You need to prepare the installed MySQL database. For details on how to install MySQL data, please refer to the author's blog Ubuntu database installation (mysql)
Note: For other operating systems, please check the database installation tutorial corresponding to other blogs
8. Create tables using Flink DDL in Flink SQL CLI
Start the Flink SQL CLI with the following command:
./bin/sql-client.sh
We should see the welcome screen of the CLI client.
First, enable checkpointing every 3 seconds
-- Flink SQL
Flink SQL> SET execution.checkpointing.interval = 3s;
Edit the source database Flink Sql code as follows:
CREATE TABLE products (
id INT NOT NULL,
name STRING,
description STRING,
PRIMARY KEY(id) NOT ENFORCED
) WITH (
'connector' = 'mysql-cdc', #引入的CDC jar包驱动,没有引入会报错提示需要引入
'hostname' = '192.168.50.163',#源数据库连接host地址,可以根据自己的具体设置,此处为笔者本机的
'port' = '3306', #源数据库端口
'username' = 'root',#源数据库账号
'password' = '*****',#源数据库密码
'database-name' = 'mydb',#源数据库
'table-name' = 'products'#源数据库表
);
Execute the following statement in Flink SQL to create a table that captures changed data from the corresponding database table
-- Flink SQL
Flink SQL> CREATE TABLE products (
id INT,
name STRING,
description STRING,
PRIMARY KEY (id) NOT ENFORCED
) WITH (
'connector' = 'mysql-cdc',
'hostname' = '192.168.50.163',
'port' = '3306',
'username' = 'root',
'password' = '****',
'database-name' = 'mydb',
'table-name' = 'products'
);
Edit the target database Flink Sql code as follows:
CREATE TABLE product (
id INT,
name STRING,
description STRING,
PRIMARY KEY (id) NOT ENFORCED
) WITH (
#引入的jdbc jar包驱动,没有引入会报错提示需要引入 flink-connector-jdbc
'connector' = 'jdbc',
#目标数据库连接url地址,可以根据自己的具体设置,此处为笔者本机的。部分高版本的MySQL需要添加useSSL=false&allowPublicKeyRetrieval=true&serverTimezone=UTC
'url' = 'jdbc:mysql://192.168.50.163:3306/mydb1?useSSL=false&allowPublicKeyRetrieval=true&serverTimezone=UTC',
#需要访问的数据库驱动
'driver' = 'com.mysql.cj.jdbc.Driver',
#目标数据库账号
'username' = 'root',
#目标据库密码
'password' = '***',
#目标数据库表
'table-name' = 'product'
);
Execute the following statement in Flink SQL to create a mapping relationship between the table that captures the changed data and the target database table
-- Flink SQL
Flink SQL> CREATE TABLE product (
id INT,
name STRING,
description STRING,
PRIMARY KEY (id) NOT ENFORCED
) WITH (
'connector' = 'jdbc',
'url' = 'jdbc:mysql://192.168.50.163:3306/mydb1?useSSL=false&allowPublicKeyRetrieval=true&serverTimezone=UTC',
'driver' = 'com.mysql.cj.jdbc.Driver',
'username' = 'root',
'password' = 'root',
'table-name' = 'product'
);
9. Load the source data table to the target MySQL
Use Flink SQL to write the table product and table query products table to the target MySQL.
-- Flink SQL
Flink SQL> insert into product select * from products;
The specific operation steps are as follows:
This is the source database, the operation adds data, as shown in the figure below:
the synchronization operation of the target database is as shown in the figure below
10. Flink visual interface to view Running JOBS
Check the red box as the running synchronization task.
So far, the first section of Flink CDC MySQL synchronization MySQL has been explained, and its complex operations will be updated later