The latest version of Flink CDC MySQL to synchronize MySQL (1)

1 Overview

Flink CDC is a set of source connectors for Apache Flink® that uses Change Data Capture (CDC) to fetch changes from different databases. CDC Connectors for Apache Flink integrates Debezium as an engine for capturing data changes. So it can give full play to the capabilities of Debezium.
insert image description here

2. Supported connectors

Connector database drive
mongodb-cdc MongoDB: 3.6, 4.x, 5.0 MongoDB Driver: 4.3.4
mysql-cdc MySQL: 5.6, 5.7, 8.0.x、RDS MySQL: 5.6, 5.7, 8.0.x、PolarDB MySQL: 5.6, 5.7, 8.0.x、Aurora MySQL: 5.6, 5.7, 8.0.x、MariaDB: 10.x、PolarDB X: 2.0.1 JDBC Driver: 8.0.28
oceanbase-cdc OceanBase CE: 3.1.x, 4.x、OceanBase EE: 2.x, 3.x, 4.x OceanBase Driver: 2.4.x
oracle-cdc Oracle: 11, 12, 19, 21 Oracle Driver: 19.3.0.0
postgres-cdc PostgreSQL: 9.6, 10, 11, 12, 13, 14 JDBC Driver: 42.5.1
sqlserver-cdc Sqlserver: 2012, 2014, 2016, 2017, 2019 JDBC Driver: 9.4.1.jre8
tidb-cdc TiDB: 5.1.x, 5.2.x, 5.3.x, 5.4.x, 6.0.0 JDBC Driver: 8.0.27
db2-cdc Db2: 11.5 Db2 Driver: 11.5.0.0
vitess-cdc Speed: 8.0.x, 9.0.x MySql JDBC Driver: 8.0.26

3. Supported Flink versions

The following table shows the version correspondence between Flink CDC Connectors and Flink®:

Flink CDC version_ Flink version_
1.0.0 1.11.*
1.1.0 1.11.*
1.2.0 1.12.*
1.3.0 1.12.*
1.4.0 1.13.*
2.0.* 1.13.*
2.1.* 1.13.*
2.2.* 1.13.*、1.14.*
2.3.* 1.13.*、1.14.*、1.15.*、1.16.0
2.4.* 1.13.*、1.14*、1.15.*、1.16.*、1.17.0

4. Features

Supports reading database snapshots, and can continue to read binlog even if a failure occurs, and perform Exactly-once processing.

CDC connector for DataStream API, users can use multiple database and table changes in a single job without deploying Debezium and Kafka.

CDC connector for Table/SQL API, users can use SQL DDL to create CDC feeds to monitor changes on a single table.

5. Table/SQL API usage

We need a few steps to setup a Flink cluster using the provided connectors.

First we installed the 1.17+ version of the Flink cluster (java 8+).

Note : If you need to install Flink, please refer to the author's corresponding blog flink high-availability cluster construction (Standalone mode).
The jar packages used in this article are flink-connector-jdbc-3.1.1-1.17.jar and flink-sql-connector-mysql-cdc -2.2.1.jar

Download the connector SQL jar (or build it yourself ).

Put the downloaded jar package in FLINK_HOME/lib/.

Restart the Flink cluster.

Note : At present, versions above 2.4 need to be compiled and built by themselves. The author of this article builds and uploads by himself

6. Use Flink CDC to perform streaming ETL on MySQL

This tutorial will show how to use Flink CDC to quickly build a streaming ETL for MySQL.

Suppose we store product data in MySQL and synchronize it to another MySQL

In the following sections, we will introduce how to use Flink Mysql CDC to achieve it. All exercises in this tutorial are performed in Flink SQL CLI, and the whole process uses standard SQL syntax without any Java/Scala code or IDE installation.

The architecture is outlined as follows:
insert image description here

7. Environment preparation

You need to prepare the installed MySQL database. For details on how to install MySQL data, please refer to the author's blog Ubuntu database installation (mysql)

Note: For other operating systems, please check the database installation tutorial corresponding to other blogs

8. Create tables using Flink DDL in Flink SQL CLI

Start the Flink SQL CLI with the following command:

./bin/sql-client.sh

We should see the welcome screen of the CLI client.
insert image description hereFirst, enable checkpointing every 3 seconds

-- Flink SQL                   
Flink SQL> SET execution.checkpointing.interval = 3s;

Edit the source database Flink Sql code as follows:

CREATE TABLE products (
 id INT NOT NULL,
 name STRING,
 description STRING,
 PRIMARY KEY(id) NOT ENFORCED
) WITH (
 'connector' = 'mysql-cdc', #引入的CDC jar包驱动,没有引入会报错提示需要引入
 'hostname' = '192.168.50.163',#源数据库连接host地址,可以根据自己的具体设置,此处为笔者本机的
 'port' = '3306', #源数据库端口
 'username' = 'root',#源数据库账号
 'password' = '*****',#源数据库密码
 'database-name' = 'mydb',#源数据库
 'table-name' = 'products'#源数据库表
);

Execute the following statement in Flink SQL to create a table that captures changed data from the corresponding database table

-- Flink SQL
Flink SQL> CREATE TABLE products (
    id INT,
    name STRING,
    description STRING,
    PRIMARY KEY (id) NOT ENFORCED
  ) WITH (
    'connector' = 'mysql-cdc',
    'hostname' = '192.168.50.163',
    'port' = '3306',
    'username' = 'root',
    'password' = '****',
    'database-name' = 'mydb',
    'table-name' = 'products'
  );

Edit the target database Flink Sql code as follows:

CREATE TABLE product (
    id INT,
    name STRING,
    description STRING,
    PRIMARY KEY (id) NOT ENFORCED
  ) WITH (
    #引入的jdbc jar包驱动,没有引入会报错提示需要引入 flink-connector-jdbc
    'connector' = 'jdbc',
    #目标数据库连接url地址,可以根据自己的具体设置,此处为笔者本机的。部分高版本的MySQL需要添加useSSL=false&allowPublicKeyRetrieval=true&serverTimezone=UTC
    'url' = 'jdbc:mysql://192.168.50.163:3306/mydb1?useSSL=false&allowPublicKeyRetrieval=true&serverTimezone=UTC',
    #需要访问的数据库驱动
    'driver' = 'com.mysql.cj.jdbc.Driver',
    #目标数据库账号
    'username' = 'root',
    #目标据库密码
    'password' = '***',
    #目标数据库表
    'table-name' = 'product'
  );

Execute the following statement in Flink SQL to create a mapping relationship between the table that captures the changed data and the target database table

-- Flink SQL
Flink SQL> CREATE TABLE product (
    id INT,
    name STRING,
    description STRING,
    PRIMARY KEY (id) NOT ENFORCED
  ) WITH (
    'connector' = 'jdbc',
    'url' = 'jdbc:mysql://192.168.50.163:3306/mydb1?useSSL=false&allowPublicKeyRetrieval=true&serverTimezone=UTC',
    'driver' = 'com.mysql.cj.jdbc.Driver',
    'username' = 'root',
    'password' = 'root',
    'table-name' = 'product'
  );

9. Load the source data table to the target MySQL

Use Flink SQL to write the table product and table query products table to the target MySQL.

-- Flink SQL
Flink SQL> insert into product select * from products;

The specific operation steps are as follows:
insert image description here

This is the source database, the operation adds data, as shown in the figure below:
insert image description here
the synchronization operation of the target database is as shown in the figure below
insert image description here

10. Flink visual interface to view Running JOBS

Check the red box as the running synchronization task.
insert image description here
So far, the first section of Flink CDC MySQL synchronization MySQL has been explained, and its complex operations will be updated later

Guess you like

Origin blog.csdn.net/weixin_43114209/article/details/131553658