mysql two-phase submission

What is a two-phase commit?
When there is data modification, the redo log cache and binlog cache will be modified first and then flushed to the disk to form the redo log file. When the redo log file is all flushed to the disk (prepare state) and after the submission is successful Only then can the binlog cache be flushed to the disk. When all the binlogs are flushed to the disk, a xid will be recorded, and then a commit mark will be marked on the relo log file (commit phase).

Why is there a two-stage commit
in MySQL? When modifying data, MySQL first copies the data from the disk to the memory, then modifies the data in the memory, records the redo log buffer, and then writes the transaction log to the disk through a system call. redo log file Finally, after the last transaction is committed, the modified data in the memory starts to be written to the disk.
1. When there is only redo log and binlog fails, the main library can be redoed through redo log, but the slave library cannot play back because it does not obtain the binlog in time, resulting in inconsistency between master and slave data. In this case, the redo log will be written to the disk, but the binlog has not yet been written to the disk. So when a crash recovery occurs, after recovery, the main library will apply the redo log to recover the data, but since there is no binlog, the slave library will not synchronize. For these data, the master database is "newer" than the slave database, causing master-slave inconsistency.
2. Similar to the previous situation, it is easy to know that this will in turn cause the slave database to be "newer" than the master database, and also cause master-slave inconsistency. .

Two-stage commit solves this problem. During crash recovery:
If the redo log has been committed, commit the transaction without hesitation.
If the redo log is in prepare, then judge whether the binlog corresponding to the transaction is complete
, then submit the transaction. Transaction commit
If not, the transaction will be rolled back
. Two-phase commit is actually to ensure the logical consistency of redo log and binlog.

 Mysql two-phase submission principle

(1) Write the redo log in the perpare phase
1. Set undo state=TRX_UNDO_PREPARED;
2. Flush the redo log generated by the transaction update;
(2) Write the binlog log in the commit phase
1. Write the binlog generated by the transaction to the file and flush it to the disk ;
2. Set the status of the undo page to TRX_UNDO_TO_FREE or TRX_UNDO_TO_PURGE; // Mark the rollback segment to be cleaned.
3. Record the binlog offset corresponding to the transaction and write it to the system table space.
Two-phase commit is a commonly used solution to maintain data logical consistency across systems. There is a blocking problem in the two stages. The proposed three-stage submission adds a pre-submission on the basis of the second stage.
(3) Why the hypothesis method verification is divided into two stages:
A. Write redo log first and then write binlog. Suppose that the MySQL process restarts abnormally when the redo log is written but before the binlog is written. As we said before, after the redo log is written, even if the system crashes, the data can still be restored, so the value of c in this line after recovery is 1.
However, since the binlog crashed before it was finished, this statement was not recorded in the binlog at this time. Therefore, when the log is backed up later, this statement will not be included in the saved binlog.
Then you will find that if you need to use this binlog to restore the temporary library, because the binlog of this statement is lost, the temporary library will not be updated this time. The value of c in the restored row is 0, which is the same as the value of the original library. different.
B. Write binlog first and then redo log. If there is a crash after the binlog is written, since the redo log has not been written yet, the transaction will be invalid after the crash recovery, so the value of c in this line is 0. But the log "Change c from 0 to 1" has been recorded in the binlog. Therefore, when binlog is used to restore later, one more transaction will come out. The value of c in the restored row is 1, which is different from the value in the original database.
(4) There are three differences between redo and binlog logs.
1. Redo log is unique to the InnoDB engine; binlog is implemented by the Server layer of MySQL and can be used by all engines.
2. The redo log is a physical log, which records "what modifications were made on a certain data page"; the binlog is a logical log, which records the original logic of the statement, such as "Add the c field of the row with ID=2 1 ".
3. The redo log is written in a loop, and the space will always be used up; the binlog can be written additionally. "Append writing" means that after the binlog file reaches a certain size, it will switch to the next one and will not overwrite the previous log.
(5) Binlog group submission:
The queue mechanism is introduced to ensure that the innodb commit sequence is consistent with the binlog disk placement sequence, and transactions are grouped, and the binlog flushing action within the group is handed over to one transaction to achieve the purpose of group submission. Binlog submission divides submission into three stages, FLUSH stage, SYNC stage and COMMIT stage. Each stage has a queue, and each queue has a mutex protection. It is agreed that the first thread entering the queue is the leader, and the other threads are followers. All things are left to the leader. After the leader completes all actions, it notifies the follower to flush the disk. Finish. The basic process of binlog group submission is as follows:
FLUSH stage
(1) Hold Lock_log mutex [leader holds, follower waits]
(2) Get a set of binlog in the queue (all transactions in the queue)
(3) Buffer the binlog to the I/O cache
(4) Notify the dump thread to dump Binlog
SYNC phase
(1) Release Lock_log mutex, hold Lock_sync mutex [leader holds, follower waits
(2) Drop a group of binlogs (sync action, the most time-consuming, assuming sync_binlog is 1)
COMMIT phase
(1) Release Lock_sync mutex , holding Lock_commit mutex [leader holds, follower waits]
(2) Traverse the transactions in the queue and perform innodb commit one by one
(3) Release Lock_commit mutex
(4) Wake up the threads waiting in the queue
. Note: Since there are multiple queues, each Each queue has mutex protection, and the queues are sequential. It is agreed that a thread entering the queue is the leader. Therefore, the leader in the FLUSH stage may be the follower in the SYNC stage, but the follower is always the follower.

MYSQL's current group submission method solves the problems of consistency and performance. Consistency is solved through two-phase commit, and disk IO performance is solved through group commit of redo log and binlog.

 

Guess you like

Origin blog.csdn.net/jbossjf/article/details/126216851