Non-stop service! How to migrate data

Data Migration Case Study

Article Address: blog.piaoruiqing.com/blog/2019/1...

Foreword

When data migration, in order to ensure data consistency, often accompanied stop taking, can not provide services to users during this period or can only provide part of the service. At the same time, in order to ensure the correctness of data migration and business, but also post-migration testing take up a lot of time. so the loss is caused by relatively large.

Next, the paper will be how to migrate data in the case of non-stop service were discussed.

Case

Such a group exists Orders table orders in the system:

Database: MySQL

Table Name: order_ {0 ~ 19}, where {0} suffix to 19, a total of 20 tables.

Primary Key: order_id, order ID, obtained by snowflakes algorithm, available Created by ID.

Original sub-table strategy: order_id% 20

With the growth in business volume, the amount of data for each sub-table has break 10 million, and so on will have a serious performance problem, then you need to be migrated original sub-table.

Claim:

  1. Migration original 20 points table data to a new table
  2. The whole migration process can not be stopped, it must provide a complete service outside.
  3. Provide a complete fallback plan, migrate data generated in the process can not be lost, not artificially repair data.

Original sub-table strategy

analysis

(Has no way to find people then hanging up hit after wave, after all, for a few people) have too much experience in the library sub-table readers may have found cases of the Central Plains sub-table strategy is unreasonable, the reason not to pursue.

Analyze the data in the table: order data will certainly be accompanied by time and traffic soared, the number of fixed points table will cause performance degradation increases with the amount of data it is, after data migration, the number of sub-table can not be fixed, even from 20 into 100 one day will reach a bottleneck.

Orders will be accompanied by time data growth, but also in the refund period, it becomes more than a cold data, the utilization rate will be reduced. Therefore, the order in accordance with the time to create sub-table is a good choice. It is worth mentioning that, order_id is obtained by snowflakes algorithm, can be obtained from order_id creation time, you can get directly through the shard key order_id.

The new sub-table strategy

Migration scenario analysis

Data migration program from the database layer to the business layer have different migration scenarios, let's list some for comparison:

  1. Business layer: a hard-coded in business layer, dual-write data, divided to a point in time, newly generated data is simultaneously written to the new table, run for some time after migrating old data to the new table high costs, and business. coupled serious, is not considered.

  2. Connection layer: 1 is the advanced version of the program, intercepting SQL dual layer writing in connection with the business decoupling, but 1 have the same problem: a longer period, to ensure that old data will not change before they migrate.

  3. Trigger: the trigger by the newly generated data synchronized to the new table, and two essentially similar.

  4. Log database: certain time point T from the database backup, database backup data migrated to the new table, reads the log from the time point T, back to the new table, and continued until two write data in sync, the new line. code.

  5. Camouflage from the library: the relative advantages of Option 4 is not required to go directly to read the log, the database is not convenient to solve directly read the log on a cloud the issue.

In comparison, Scheme 4 and 5 are optional, because the database on the cloud, inconvenient to directly read the log, and Scheme 5 middleware open source mature ** Canal ** available, so I choose Scheme 5 .

Canal document Address: github.com/alibaba/can...

data migration

[Copyright]
This article published in Pu Ruiqing's blog , allows non-commercial use reproduced, reprinted but must retain the original author Pu Ruiqing and links: blog.piaoruiqing.com . If the authorization aspects of consultation or cooperation, please contact E-mail: piaoruiqing @ Gmail. COM .

Fallback scenario analysis

After the new code on the line, no one can ensure hundred percent no problem. If the migration fails, we must be rolled back. Therefore, the need to ensure synchronization of the original data and the new data.

So, the basis of a section 5 of the previous program, cut traffic to the new cluster, we stop data synchronization, start synchronizing new data from a table cut traffic to the old time table program is disguised from the library. So we can guarantee the old and the new table data synchronization, if an exception occurs on the line, will flow back to the old cluster can be cut.

Overall program design

Backup data source

  1. Execution flush logs: Generate new binlog, data recovery starts here.
  2. Backup data table (order_ {0 ~ 19}): the source (old) copy data from the master table to the backup repository A repository B

Backup data source

Restore and synchronize data

  1. Create enough new table in the main library A, order new table is divided according to the month table.
  2. Write the script reads the backup repository table B in order, written order new table A of the main library.
  3. By canal start synchronizing the old table data into a new table, named [synchronization process -a].

Synchronous Data

online

  1. Compile the new code and a new cluster bomb, confirm complete start is completed.
  2. Execution flush logsgenerate new binlog, synchronous data to the new table the old table starts here.
  3. Traffic is switched to the new cluster.
  4. Stop [synchronization process -a].
  5. Start synchronous data to the new table the old table.

go back

It should be tested in a timely manner on the line, if it is found serious abnormalities immediately cut the flow back to the old cluster.

Epilogue

  • flash logsFirst in the backup source data table, even though the middle of a slight interval will not affect the final consistent data (the total right to listen to the binlog).
  • Priceless data, exercise caution.

If this article helpful, please point a praise it (¯ ▽ ¯) "for you

Recommended Reading

Welcome to public concern number (Code poetic):

[Copyright]
This article published in Pu Ruiqing's blog , allows non-commercial use reproduced, reprinted but must retain the original author Pu Ruiqing and links: blog.piaoruiqing.com . If the authorization aspects of consultation or cooperation, please contact E-mail: piaoruiqing @ Gmail. COM .

Guess you like

Origin juejin.im/post/5db55fb551882551386a1117