[MySQL] Simple data rollback through Binary Log (1)

I. Introduction

Yes, that's right, I've been in the water for a while, deeply reflecting on sending a few. Some time ago, due to the possibility of misoperation in batch operations such as excel, the work item required the ability to roll back data according to batches of operations. In the process of development, I came into contact with MySQL's Binary Log, and I feel that I have gained something to record it.

2. The concept of Binary Log

First of all, we need to understand what Binary Log is (click for details):

Binary Log (binary file), which contains "events" that describe database changes, such as operations that create tables or alter table data. If a row-based log is used, it can also contain statement events that have changed (eg, DELETE events with no corresponding row).

That is to say, your operations on the database, CRUD including INSERT and DELETE, and binlog (abbreviation in the command) will be included. Then, if we can parse (because we can know from the name of binlog, this is a binary file, not Humans can read its content, and then the executed statement can be reversed, and the misoperated data can be recovered.

This is also one of the purposes of binlog: data recovery

Another use of binlog is for master-slave replication. We all know that in the current big data context, the conventional single database can no longer meet the demand for access, so a database cluster has emerged: the main database performs write operations, and the slave database performs read operations, thereby reducing the access pressure of the database, and in order to To ensure that the content of the database is consistent, binlog is used to ensure that, as shown in the following figure: It is not detailed here.

3. View Binary Log through shell

After understanding the concept of binlog, let's take a look at binlog through the shell.

First, add the following configuration to my.cnf:

[mysqld]
log-bin=mysql-bin
binlog-format=ROW #选择row模式
server_id=1 #避免和slave机器重复
log_bin_basename=xxx 可选
log_bin_index=xxx 可选

Restart MySQL after saving.

Enter MySQL Command:

mysql> show variables like '%log_bin%'; 查看binglog路径
+---------------------------------+---------------------------------------+
| Variable_name                   | Value                                 |
+---------------------------------+---------------------------------------+
| log_bin                         | ON                                    |
| log_bin_basename                | /usr/local/mysql/data/mysql-bin       |
| log_bin_index                   | /usr/local/mysql/data/mysql-bin.index |
| log_bin_trust_function_creators | OFF                                   |
| log_bin_use_v1_row_events       | OFF                                   |
| sql_log_bin                     | ON                                    |
+---------------------------------+---------------------------------------+
  • log_bin: on means Binary Log is turned on
  • log_bin_basename: The base file name of the binary log, which can be specified in my.cnf
  • log_bin_index: The index file of the binlog file, which can be specified in my.cnf
mysql> show binary logs;
+------------------+-----------+
| Log_name         | File_size |
+------------------+-----------+
| mysql-bin.000001 |   9309624 |
| mysql-bin.000002 |   9008629 |
| mysql-bin.000003 |    229080 |
| mysql-bin.000004 |  15410010 |
| mysql-bin.000005 |       177 |
| mysql-bin.000006 |   5798399 |
| mysql-bin.000007 |       177 |
+------------------+-----------+

Display all binary log files and file sizes of the current database

Knowing this, exit MySQL Command and view it in the shell:

> sudo -u mysql mysqlbinlog /usr/local/mysql/data/mysql-bin.000030

Since my /usr/local/mysql/data is only given to the mysql user by default when installing MySQL, I need to add -u to switch to mysql.

At this point, you can view the contents of the binary file (some intercepted):

# at 1341475
#180416 15:58:45 server id 1  end_log_pos 1341582 CRC32 0x0ca6c030  Table_map: `user-center`.`t_management_entity_role` mapped to number 127
# at 1341582
#180416 15:58:45 server id 1  end_log_pos 1341686 CRC32 0x33552cef  Write_rows: table id 127 flags: STMT_END_F

BINLOG '
tVfUWhMBAAAAawAAAI54FAAAAH8AAAAAAAEADnNoLXVzZXItY2VudGVyABh0X21hbmFnZW1lbnRf
ZW50aXR5X3JvbGUADAMPDw8PDwEPDxIPEhJgADYAYAC0AAMAAwDAAADAAAASADDApgw=
tVfUWh4BAAAAaAAAAPZ4FAAAAH8AAAAAAAEAAgAM//8Q8IkAAAARc3ViX2VtcGxveWVlX2RlcHQG
5qCh5belDXNjaG9vbF93b3JrZXIBMQIBMAN6a2qZn6D7wAN6a2qZn6D7wO8sVTM=
'/*!*/;
# at 1341686
#180416 15:58:45 server id 1  end_log_pos 1341717 CRC32 0x1fdc2123  Xid = 22495
COMMIT/*!*/;
# at 1341717
#180416 16:41:12 server id 1  end_log_pos 1341740 CRC32 0xca0bf05c  Stop
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
DELIMITER ;
# End of log file
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

Seeing this, I feel that the specific contents of BINLOG are still readable by non-human beings. If you want to know the secrets, it seems that you still have to read the MySQL development manual.


Fourth, the analysis principle of binlog and the open source analysis tool on GitHub

4.1 Several formats of binlog

To understand the analysis principle of MySQL, of course, you must read the MySQL development manual carefully from beginning to end. If you want to know more, you can click here . There are detailed information about MySQL. The author picks up the part that introduces binlog to briefly talk about it.

First of all, we must understand the format of binlog. There are three formats of binlog: STATEMENT, ROW, MIXED. Let's introduce them one by one:

  1. STATEMENT
  • Literally means to describe. The SQL statement relative to the operation is recorded. For example DELETE FROM foo WHERE id = 1, if it is executed on the console, this statement will be added to the binlog. The benefits are clear: intuitive.
  • Yes, there is always a but~~, but the correctness of the logging is not guaranteed .
  • Client may not generate row events
  1. ROW
  • Ensure the correctness of logging
  • DML changes may only be recorded in ROW mode, not in STATEMENT mode. The content change of each line consists of Before Image (BI) and After Image (AI). BI records each column of data before the row is changed, while AI records each column of data after the change. There are three types of log_event:
    • Write_rows_log_event: Adds a new row to the table, and AI.
    • Update_rows_log_event: Modify rows that already exist in the table, both AI and BI.
    • Delete_rows_log_event: Delete existing rows in the table, only BI.
  1. MIXED
  • To ensure the correctness of log records , first use STATEMENT records, and if they cannot be recorded correctly, use ROW mode to record.
  • The difficulty of processing is increased, and two implementations need to be written.

It can be seen from the above that the STATEMENT mode is unavailable, because it cannot guarantee the correctness of the log, and the MIXED mode will increase the complexity of the code. Considering two situations, it increases the workload of the code, so the ROW mode is used in the implementation. is common practice.


4.2 event format of binlog

The Binary Log is a staple of reading. It focuses on the message body format and event format of binlog. The author chooses a part to talk about.

As mentioned earlier, the operation of the database is written into the binlog in the form of event events, so what is the format of the event? All event events have a common common structure, consisting of an event title and event data:

+===================+
| event header      |
+===================+
| event data        |
+===================+

The event header and data parts have different changes under different MySQL versions. Specifically:

  • v1: used in MySQL 3.23
  • v3: yes yes, no v2, used in MySQL 4.0.2 to 4.1
  • v4: used in MySQL 5.0 and above

The previous version 5.0 will not be introduced. Let’s look directly at the event structure of the v4 version:

+=====================================+
| event  | timestamp         0 : 4    |
| header +----------------------------+
|        | type_code         4 : 1    |
|        +----------------------------+
|        | server_id         5 : 4    |
|        +----------------------------+
|        | event_length      9 : 4    |
|        +----------------------------+
|        | next_position    13 : 4    |
|        +----------------------------+
|        | flags            17 : 2    |
|        +----------------------------+
|        | extra_headers    19 : x-19 |
+=====================================+
| event  | fixed part        x : y    |
| data   +----------------------------+
|        | variable part              |
+=====================================+
  • bytes
  • Length of header = x bytes
  • length of data = (length of event - x) bytes
  • Length of fixed part = y bytes variable length.
  • x is defined by format description event (FDE), currently x is 19, i.e. extra_headers is empty
  • y specifies the type of event, which is also defined in FDE. The fixed part of the same event has the same length, and the length of different events is different.


Let's take a look at the format of the event data section, taking the format of the inserted row event as an example ( Write_rows_log_event/WRITE_ROWS_EVENT ):

  1. Fixed data section
  • 6 bytes: the id of the table
  • 2 bytes: keep spare
  1. variable data part
  • Packed integers (unsigned integers in a special format that can store 8-byte integers, see here for the representation method ): the number of columns in the table.
  • Variable size: use bits to indicate whether each column is used, one bit per column, if there are N columns, use INT((N+7)/8)bytes
  • Variable size (for UPDATE_ROWS_LOG_EVENT), same as above, indicating whether each column is used after the update
  • Variable size: zero or more lines, the cut-off position is determined by the event_length in the event header, and the format of each line is as follows:
    • Variable size: bit to indicate whether each field in the row is NULL, 1 means null, 0 means not null, only the columns in the second part of the data part will appear here. INT((N+7)/8)bytes required
    • Variable size: row image, containing the values ​​of all table fields. This will only list table fields that are used (according to the second field of the variable data section) and non-NULL (according to the previous field).
    • For UPDATE_ROWS_LOG_EVENT, the above two are repeated, indicating the updated value

This is why, when converting time in many open source projects, there are inexplicable conversions of several bytes.


4.3 Open source parsing binlog tool on GitHub

Here is a brief introduction to several known binlog parsing projects:

  1. canal

Incremental subscription & consumption component of Alibaba MySQL database binlog. After understanding the content that can be parsed by binlog, I feel that canal is doing really well. The original binlog parsed does not have column name information, column encoding, and column type. Canal adds an extra layer on this basis to complete the corresponding The column information improves the basic demands of the public business to understand binlog. In fact, adding a column name is not as simple as sending show create table xxxit. Considering that the column may be added or deleted, the column consumed at the time of t0 may not correspond to the column at the time of t1 at this time, and there will be many problems in the middle.

  1. mysql-binlog-connector-java

The predecessor was open-replicator . After the author no longer updated the code, the author completely rewrote the project and added many new features of MySQL5.x. The parsing result is very native to canal. But also let the author worship.

  1. binlog2sql

The first two are java language projects, this is written in python, and the SQL you want is parsed from MySQL binlog. Depending on the options, you can get raw SQL, rollback SQL, INSERT SQL with primary keys removed, etc. It is a project closest to the author's needs. Basically, it can be used directly with the project code. However, the author's obsessive-compulsive disorder occurs. Since the project written is a java project, although jython can be realized, the author still wants to toss other things. Did not use. o(TωT)o 

V. Summary

The first part first records the entire operation process, and the second part writes the specific implementation process. Thank you for watching, if there is anything wrong with the description, please correct me and make progress together with everyone!

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324813089&siteId=291194637