MySQL distributed database implementation: without modifying the code, it is easy to realize distributed capabilities

what does this project do

ShardingSphere-Proxy allows users to use Apache ShardingSphere like a native database.

The beginning of understanding a technology generally starts from the official website. Let’s take a look at the official website’s definition of ShardingSphere-Proxy:

Positioned as a transparent database agent, it provides a server version that encapsulates the database binary protocol to support heterogeneous languages. Currently, it provides versions of MySQL and PostgreSQL (compatible with openGauss and other PostgreSQL-based databases). It can use any access client compatible with MySQL/PostgreSQL protocol (such as: MySQL Command Client, MySQL Workbench, Navicat, etc.) to operate data, which is more friendly to DBA.

First clarify a concept, ShardingSphere-Proxy is a service process. From the perspective of client program connection, it is no different from MySQL database.

Why use Proxy

In the case of sub-database sub-table or other rules, the data will be distributed to multiple database instances, which will inevitably cause some inconvenience in management; or developers who use non-Java languages ​​need the capabilities provided by ShardingSphere... The above situations are exactly what ShardingSphere-Proxy can do.

1. Proxy application scenarios

In daily work, there are many scenarios where you use ShardingSphere-JDBC to split databases and tables. Assuming that you have a user table, which has been horizontally divided into databases by user ID in the form of Hash, then the way the client connects to the database is as follows:

Let us give examples of several scenarios that actually exist in our work:

  1. The test students want to see the information of user ID 123456 in the database table, and you need to provide which sub-table the user is in;
  2. Company leaders need technology to provide a total of user growth and user information in 2022;
  3. The company holds an 8th anniversary event and needs technology to provide a list of active old users whose registration date exceeds the 8th anniversary.

Because after the data is divided into databases and tables, the data is scattered in different database tables, and it is not easy to realize the above scenarios; if in order to achieve similar temporary requirements, it is necessary to develop code every time, which seems a bit cumbersome. At this time, the main character of the article, ShardingSphere-Proxy, needs to appear.

ShardingSphere-Proxy hides the actual back-end database. For the client, it is using a database and does not need to care about how ShardingSphere coordinates the database behind it. It is more friendly to developers or DBAs who use non-Java languages.

For example, t_user is split into several real tables at the database level: t_user_0in t_user_9the process of operating ShardingSphere-Proxy, the client will only know that there is a t_user logical table, and the process of routing to the real table is executed inside ShardingSphere-Proxy.

  1. Logical table: the logical name of the horizontal split database (table) with the same structure, which is the logical identifier of the table in SQL. Example: User data is split into 10 tables according to the end of the primary key, namely t_user_0to t_user_9, and their logical table names t_user.
  2. Real table: A physical table that actually exists in a horizontally split database. That is t_user_0to t_user_9.

2. The difference between JDBC and Proxy

After reading the above description, how do you feel that ShardingSphere-Proxy and ShardingSphere-JDBC are so similar, and what is the difference between them?

ShardingSphere-JDBC ShardingSphere-Proxy
database arbitrarily Database based on MySQL / PostgreSQL protocol
connection consumption high Low
heterogeneous language Support Java and other JVM-based languages arbitrarily
performance low loss Slightly higher loss
Decentralized yes no
static entry none have

Briefly summarize the difference between the two:

  1. ShardingSphere-JDBC is a Jar package. The bottom layer completes SQL parsing, routing, rewriting, execution and other processes by rewriting JDBC components; configuration files for corresponding functions need to be added to the project, which is intrusive to the application;
  2. ShardingSphere-Proxy is a process service, and it is positioned as an efficiency tool to assist development and operation in most cases. It disguises itself as a database, and the code is non-invasive after the application is docked; the execution logic of SQL is consistent with ShardingSphere-JDBC, and the two reuse the same kernel.

Since ShardingSphere-Proxy has no intrusion to the application, and the two reuse the same kernel, why do you still use ShardingSphere-JDBC?

  1. The application directly operates the database through ShardingSphere-JDBC, which is equivalent to only one network IO; while the application connects to ShardingSphere-Proxy is one network IO, and ShardingSphere-Proxy operates the database again, and another network IO will occur;
  2. The application call link has an extra layer, which is easy to form a traffic bottleneck and increase potential risks to the application; generally speaking, the application program will be used with ShardingSphere-JDBC.

Of course, ShardingSphere-JDBC and ShardingSphere-Proxy can be deployed in a mixed manner. ShardingSphere-JDBC is suitable for high-performance lightweight OLTP applications developed in Java. ShardingSphere-Proxy is suitable for OLAP applications and scenarios for managing and maintaining sharded databases. .

how to start

There are three ways to start ShardingSphere-Proxy: binary package, Docker, and Helm, which are divided into stand-alone deployment and cluster deployment. The article starts as a stand-alone binary package.

  1. Obtain the ShardingSphere-Proxy binary installation package through the download page ;
  2. After decompression, modify conf/server.yamland config-the files starting with the prefix, and configure rules such as fragmentation and read-write separation;
  3. Please run the Linux operating system bin/start.sh, please run the Windows operating system bin/start.batStart ShardingSphere-Proxy.

The downloaded file directory is as follows:

├── LICENSE
├── NOTICE
├── README.txt
├── bin # 启动停止脚本
├── conf # 服务配置,分库分表、读写分离、数据加密等功能的配置文件
├── lib # Jar 包
└── licenses
复制代码

1. Copy the MySQL JDBC driver to the ext-lib package

Download the driver mysql-connector-java-5.1.47.jar or mysql-connector-java-8.0.11.jar into the ext-lib package. Because there is no ext-lib in the initial directory, you need to create it yourself.

2. Modify the conf/server.yaml configuration file

The default cluster operation mode in the server.yaml configuration, here is a stand-alone operation configuration.

mode:
 type: Standalone # 单机模式
 repository:
   type: File
   props:
     path: /Users/xxx/software/apache-shardingsphere-5.1.0-shardingsphere-proxy/file # 元数据配置等持久化文件路径
 overwrite: false # 是否覆盖已存在的元数据
​
rules: # 认证信息
 - !AUTHORITY
   users: # 初始化用户
     - root@%:root
     - sharding@:sharding
   provider:
     type: ALL_PRIVILEGES_PERMITTED
 - !TRANSACTION
   defaultType: XA
   providerType: Atomikos
 - !SQL_PARSER
   sqlCommentParseEnabled: true
   sqlStatementCache:
     initialCapacity: 2000
     maximumSize: 65535
     concurrencyLevel: 4
   parseTreeCache:
     initialCapacity: 128
     maximumSize: 1024
     concurrencyLevel: 4
​
props: # 公用配置
 max-connections-size-per-query: 1
 kernel-executor-size: 16  # Infinite by default.
 proxy-frontend-flush-threshold: 128  # The default value is 128.
 proxy-opentracing-enabled: false
 proxy-hint-enabled: false
 sql-show: false
 check-table-metadata-enabled: false
 show-process-list-enabled: false
   # Proxy backend query fetch size. A larger value may increase the memory usage of ShardingSphere Proxy.
   # The default value is -1, which means set the minimum value for different JDBC drivers.
 proxy-backend-query-fetch-size: -1
 check-duplicate-table-enabled: false
 proxy-frontend-executor-size: 0 # Proxy frontend executor size. The default value is 0, which means let Netty decide.
   # Available options of proxy backend executor suitable: OLAP(default), OLTP. The OLTP option may reduce time cost of writing packets to client, but it may increase the latency of SQL execution
   # and block other clients if client connections are more than `proxy-frontend-executor-size`, especially executing slow SQL.
 proxy-backend-executor-suitable: OLAP
 proxy-frontend-max-connections: 0 # Less than or equal to 0 means no limitation.
 sql-federation-enabled: false
   # Available proxy backend driver type: JDBC (default), ExperimentalVertx
 proxy-backend-driver-type: JDBC
​
复制代码

It should be noted that if you start a stand-alone ShardingSphere-Proxy, you need to change the Proxy configuration later. It is recommended to set mode.overwrite to true, so that ShardingSphere-Proxy will reload metadata when it starts.

3. Start ShardingSphere-Proxy

Execute the start command: sh bin/start.sh. The default startup port 3307can be replaced by adding parameters to the startup script command: sh bin/start.sh 3308.

To check whether ShardingSphere-Proxy is started successfully, execute the command to check the log: tail -100f logs/stdout.log. If the following information appears in the last line, the startup is successful:

[INFO ] xxx-xx-xx xx:xx:xx.xxx [main] o.a.s.p.frontend.ShardingSphereProxy - ShardingSphere-Proxy Standalone mode started successfully
复制代码

scene practice

This chapter starts from the premise of actual combat scenarios, and fulfills the above requirements through ShardingSphere-Proxy.

1. Initialize the database table

# CREATE DATABASE
CREATE DATABASE user_sharding_0;
​
CREATE DATABASE user_sharding_1;
​
# CREATE TABLE
use user_sharding_0;
​
CREATE TABLE `t_user_0` (
  `id` bigint (20) NOT NULL,
  `user_id` bigint (20) NOT NULL,
  `create_date` datetime DEFAULT NULL,
  PRIMARY KEY (`id`)) ENGINE = InnoDB DEFAULT CHARSET = latin1;
​
CREATE TABLE `t_user_1` (
  `id` bigint (20) NOT NULL,
  `user_id` bigint (20) NOT NULL,
  `create_date` datetime DEFAULT NULL,
  PRIMARY KEY (`id`)) ENGINE = InnoDB DEFAULT CHARSET = latin1;
​
​
use user_sharding_1;
​
CREATE TABLE `t_user_0` (
  `id` bigint (20) NOT NULL,
  `user_id` bigint (20) NOT NULL,
  `create_date` datetime DEFAULT NULL,
  PRIMARY KEY (`id`)) ENGINE = InnoDB DEFAULT CHARSET = latin1;
​
​
CREATE TABLE `t_user_1` (
  `id` bigint (20) NOT NULL,
  `user_id` bigint (20) NOT NULL,
  `create_date` datetime DEFAULT NULL,
  PRIMARY KEY (`id`)) ENGINE = InnoDB DEFAULT CHARSET = latin1;
复制代码

2. Initialize Proxy shard configuration

schemaName: sharding_db
​
dataSources:
  ds_0:
    url: jdbc:mysql://127.0.0.1:3306/user_sharding_0?serverTimezone=UTC&useSSL=false
    username: root
    password: root
    connectionTimeoutMilliseconds: 30000
    idleTimeoutMilliseconds: 60000
    maxLifetimeMilliseconds: 1800000
    maxPoolSize: 50
    minPoolSize: 1
  ds_1:
    url: jdbc:mysql://127.0.0.1:3306/user_sharding_1?serverTimezone=UTC&useSSL=false
    username: root
    password: root
    connectionTimeoutMilliseconds: 30000
    idleTimeoutMilliseconds: 60000
    maxLifetimeMilliseconds: 1800000
    maxPoolSize: 50
    minPoolSize: 1
​
rules:
- !SHARDING
  tables:
    t_user:
      actualDataNodes: ds_${0..1}.t_user_${0..1}
      tableStrategy:
        standard:
          shardingColumn: user_id
          shardingAlgorithmName: t_user_inline
      keyGenerateStrategy:
        column: user_id
        keyGeneratorName: snowflake
  bindingTables:
    - t_user
  defaultDatabaseStrategy:
    standard:
      shardingColumn: user_id
      shardingAlgorithmName: database_inline
  defaultTableStrategy:
    none:
​
  shardingAlgorithms:
    database_inline:
      type: INLINE
      props:
        algorithm-expression: ds_${user_id % 2}
    t_user_inline:
      type: INLINE
      props:
        algorithm-expression: t_user_${user_id % 2}
​
  keyGenerators:
    snowflake:
      type: SNOWFLAKE
复制代码

3. Fragmentation test

Use the MySQL terminal command to connect to the ShardingSphere-Proxy server. If the database is deployed by Docker, you need to add -h local ip. Because the access to 127.0.0.1 in the container is unavailable.

# 将 {xx} 替换为实际参数
mysql -h {ip} -u {username} -p{password} -P 3307
# 示例命令
mysql -h 127.0.0.1 -u root -proot -P 3307
复制代码

ShardingSphere-Proxy supports the connection of Navicat MySQL, DataGrip, WorkBench, TablePlus and other database management tools.

After the connection is successful, query the proxy database, which is consistent with the configuration file.

mysql> show databases;
+-------------+
| schema_name |
+-------------+
| sharding_db |
+-------------+
1 row in set (0.02 sec)
复制代码

Execute the new t_user statement to insert 6 pieces of user data, 3 pieces created in 2021 and 3 pieces in 2022.

mysql> use sharding_db;
mysql> INSERT INTO t_user (id, user_id, create_date) values(1, 1, '2021-01-01 00:00:00'), (2, 2, '2021-01-01 00:00:00'), (3, 3, '2021-01-01 00:00:00'), (4, 4, '2022-01-01 00:00:00'), (5, 5, '2022-02-01 00:00:00'), (6, 6, '2022-03-01 00:00:00');
Query OK, 6 rows affected (0.16 sec)
​
mysql> select * from t_user;
+----+---------+---------------------+
| id | user_id | create_date         |
+----+---------+---------------------+
|  2 |       2 | 2021-01-01 00:00:00 |
|  4 |       4 | 2022-01-01 00:00:00 |
|  6 |       6 | 2022-03-01 00:00:00 |
|  1 |       1 | 2021-01-01 00:00:00 |
|  3 |       3 | 2021-01-01 00:00:00 |
|  5 |       5 | 2022-02-01 00:00:00 |
+----+---------+---------------------+
复制代码

At this point the data are scattered in user_sharding_0and user_sharding_1libraries respectively.

Back to the original question, how to locate data information. Because ShardingSphere-Proxy has logically aggregated the table, it is fine to query directly.

mysql> select * from t_user where user_id = 1;
+----+---------+---------------------+
| id | user_id | create_date         |
+----+---------+---------------------+
|  1 |       1 | 2021-01-01 00:00:00 |
+----+---------+---------------------+
1 row in set (0.01 sec)
复制代码

The second question is to query the number of user growth and user status in 2022.

mysql> select count(*) from t_user where create_date > '2022-00-00 00:00:00';
+----------+
| count(*) |
+----------+
|        3 |
+----------+
1 row in set (0.10 sec)

mysql> select * from t_user where create_date > '2022-00-00 00:00:00';
+----+---------+---------------------+
| id | user_id | create_date         |
+----+---------+---------------------+
|  4 |       4 | 2022-01-01 00:00:00 |
|  6 |       6 | 2022-01-01 00:00:00 |
|  5 |       5 | 2022-01-01 00:00:00 |
+----+---------+---------------------+
3 rows in set (0.02 sec)
复制代码

The third question is the same as above.

final summary

The article helps everyone go through the basic concepts of ShardingSphere-Proxy through illustrations and texts, and introduces the actual operation and maintenance scenarios generated after sub-database and table division, and demonstrates how to solve related problems through ShardingSphere-Proxy.

I believe that everyone has a deeper understanding of ShardingSphere-Proxy after reading it. First of all, we must understand that the positioning of ShardingSphere-Proxy is to assist in the development and maintenance of products, master the differences between ShardingSphere-JDBC and ShardingSphere-Proxy, and understand the advantages and disadvantages of the two and how they are implemented. On this basis, it is easier to understand the source code of the two.

Guess you like

Origin blog.csdn.net/csdn1234561231/article/details/130057925