What is MVCC?
What are the benefits of using MVCC (problems solved)
There are 4 isolation levels in SQL: read uncommitted, read committed, repeatable read, and serialized. They read the three types respectively as shown below:
dirty read | non-repeatable read | phantom reading | |
Read uncommitted | × | × | × |
Read submitted | √ | × | × |
repeatable read | √ | √ | × |
serialization | √ | √ | √ |
As can be seen from the above table, from top to bottom, the lower the isolation level, the lower the performance, but the smaller the problem of data concurrency, and by the time of serialization, it is already the slowest performance. However, MVCC version concurrency control can make repeatable reading and solve the problem of phantom reading , which greatly enhances the performance of SQL. As shown below:
dirty read | non-repeatable read | phantom reading | |
Read uncommitted | × | × | × |
Read submitted | √ | × | × |
Repeatable read/serialization | √ | √ | √ |
Repeatable reading/serialization uses the MVCC+Next-key lock mechanism.
MVCC can not use the lock mechanism, but solve the problem of non-repeatable reading and phantom reading through optimistic locking! It can replace row-level locks in most cases and reduce system overhead.
Implementation principle of MVCC
The implementation principle of MVCC is: hidden fields , UNDO LOG field chain , READ VIEW view
What is ReadView
In the MVCC mechanism, when multiple transactions update multiple versions of the same row record , multiple historical snapshots will be generated, and these historical snapshots are stored in the UNDO LOG. If a transaction wants to query this row record, it needs to read Which version of the row record should be obtained? At this time, we need READ View to help us solve the visibility problem.
ReadView is a transaction that performs a snapshot read operation when using the MVCC mechanism to generate a read view. When a transaction is started, a current snapshot of the database system will be generated. InnDB constructs a set of arrays for each transaction to record and maintain the ID of the currently active transaction in the system .
The ReadView view mainly contains 4 important contents:
1. creator_trx_id: the transaction ID that created this ReadView
2. trx_ids: Indicates the ID list of the current active transaction when generating ReadView.
3. up_limit_id: the smallest transaction ID of the active transaction
4. low_limit_id: indicates the system’s maximum transaction ID + 1
MVCC operation process
When we query a record, we will first get the version number of the transaction, which is its own transaction ID, and then generate
undo log (historical snapshot) + ReadView In ReadView, the ID of the smallest transaction will be compared with the current trx_ids. If it is in the current active list, continue to search downwards until it finds a piece of data smaller than itself.
The vernacular would be too difficult to understand, so let’s go straight to the picture
Notice:
3 intermediate levels corresponding to MVCC
Read Committed: A new view appears for each query
Repeatable read : When the transaction is not committed, each query is the view of the first query.
Serialization : same as repeatable read
MVCC reads the submitted process
You can take a look at a screenshot I found online.
Use the read committed isolation level: a new view will be generated for each query
Now there are two transactions transaction10 and transaction20
You can see that transaction 10 updated some data, and transaction 20 did not update the data. At this time, an undo log snapshot will be generated and saved. When we use the transaction to read the transaction currently being operated, ReadView will be triggered. In ReadView, There are 4 key data creator_trx_id, trx_ids, up_limit_id, low_limit_id. These data have been mentioned above, so trx_ids is [10,20] ; up_limit_id is 10 ; low_limit_id is 21.
Select the data in the version chain of the undo log. The first one is Wang Wu trx_id 10. It is found that it is already in the range of trx_ids in ReadView , indicating uncommitted transactions , so it cannot be queried, and then pushes down in sequence. Li Fourth, it is also in the active chain. When Zhang San's trx_id is pushed to 8 , it is found that it is not in the active version chain , indicating that it has been submitted and the data can be queried , so Zhang San's data is returned!
Let’s look at the level of repeatable reads : MVCC solves the problems of phantom reads and non-repeatable reads.
MVCC repeatable read process
Or just look at the pictures directly for easier understanding.
We can find that the value obtained is Zhang San, which is the same as the above read submitted operation.
The ReadView generated by this query is: trx_ids is [10,20] ; up_limit_id is 10 ; low_limit_id is 21
Because the current isolation level is repeatable read , subsequent transactions will use this ReadView view
We found that its undoLog version chain has changed, but his view ReadView is still the view queried for the first time, and the trx_ids [10,20] in the view queried for the first time are
Therefore, when comparing undolog through trx_id and ReadView, the data of Zhang San was finally queried!
MVCC phantom reading process
If the above two isolation levels are understood, phantom reading will be easy to understand, because phantom reading also generates the ReadView view during the first query, so no matter how many times it is queried, there will only be this one view. , so when we completed the data insertion and compared the undolog chain with the ReadView, we found that the inserted new data was in the active id, so the current data could not be queried. Next, we searched downwards until we found the version that was not in the active id. Data in the chain!