1. MySQL infrastructure

Column address:

MySQL series articles column



1. MySQL logical architecture

Compared with other databases, MySQL's biggest advantage lies in its flexibility. This flexibility comes from its architecture design that separates storage and computing . MySQL separates query processing and other system tasks from storage/extraction, making it possible to select appropriate data storage methods according to different usage scenarios.

The overall architecture of MySQL is shown below:

On the whole, MySQL can be divided into Sercer layer and storage engine layer.

The interaction between MySQL and the client

MySQL is "reading while sending". Each line read is directly put into net_buffer, and when the buffer is full, it is sent to the client. The default size of net_buffer is 16K.

MySQL also does not save the complete result set. If the client data is not read in time, it will block the query process and suspend the process of reading data, so it will not burst the memory.

1.1 Server layer

The Server layer mainly includes: connectors, query caches, analyzers, optimizers, executors, etc., while all built-in functions, stored procedures, views, triggers, etc. are implemented in this layer, which covers most of MySQL Core functions.

Connector

The connection between the client and the MySQL server is based on TCP, and the connector is responsible for establishing and maintaining a connection with the client, obtaining operation permissions, and so on. The communication protocol is half-duplex, and the server does not return the entire query result at once, but returns in time.

Query cache

Before parsing a query statement, MySQL first queries whether the statement has been executed in the cache. The previously executed statements will be cached in the memory in the form of key-value. The key is the hash value of the query statement, client protocol version number and other information that may affect the query result, and the value is the query result.

Query caching often does more harm than good, because its hit conditions are more demanding:

  1. Any character difference in the query statement will cause a cache miss
  2. Non-deterministic values ​​such as functions can also cause cache misses
  3. The update will invalidate all caches of the table

In addition, the cache operation also involves locks.

Analyzer

After the cache misses, MySQL will start to actually execute the statement. The main responsibilities of the analyzer are:

  1. Lexical analysis: Identify the meaning of characters in SQL strings, such as parsing table names into tables.
  2. Syntax analysis: Analyze whether the SQL statement meets the grammatical rules, if a grammatical error occurs, it will return "You have an error in your SQL syntax"

The analyzer finally generates a parse tree.

Optimizer

After going through the analyzer, MySQL has understood the operation to be performed by the SQL statement. Before execution, it is processed by the optimizer and an execution plan is generated. The responsibility of the optimizer is to minimize query costs, such as:

  1. Choose the right index
  2. Choose the appropriate connection order for each table

The unit of cost measurement is the cost of randomly reading a 4k data page. A lot of information will cause the optimizer to choose the wrong execution plan, such as:

  1. Statistical information, such as the page tree of each table, index, cardinality of the index, etc. The statistical information is provided by the storage engine
  2. The optimizer does not consider whether the page is in the cache, and cannot know how many physical IOs are required
  3. The optimizer will not exhaust all execution plans

Actuator

The executor executes one by one according to the instructions of the execution plan, calls the storage engine interface and organizes the rows that meet the conditions into a result set and returns it to the client. In the slow SQL log, the row_examined field indicates how many rows of data the executor scanned in total.
Operations such as join, order, and group are also executed in the actuator.

1.2 Storage Engine

The storage engine is responsible for the storage and extraction of MySQL data. The storage engine is designed as a plug-in. InnoDB is the default storage engine for MySQL. The last default engine is MyISAM. InnoDB is a transactional engine with features such as failure recovery, and is also the most widely used storage engine. MyISAM does not support row-level locking and failure recovery after a crash, but it is not useless. It provides features such as compressed tables and GIS spatial functions. Unless you need to use features that InnoDB does not have, you should use the InnoDB engine first. For example, if you don't care about failure recovery but are more sensitive to InnoDB taking up too much space, you can use MyISAM.

The key features of InnoDB are:

  1. change buffer
  2. double write
  3. Adaptive hash index
  4. Refresh adjacent page
  5. Asynchronous IO

2. Concurrency Control

Whenever there are multiple queries that modify data at the same time, concurrency control problems will arise. MySQL performs concurrency control at two levels: Server layer and storage engine layer.

2.1 Read-write lock

When dealing with concurrent read or write, the classic solution is to use a lock system composed of two types of locks: read lock, also known as shared lock, and write lock, also known as exclusive Lock (exclusive lock). The read lock is shared, and multiple read operations do not interfere with each other or block each other. Write locks are exclusive, that is, a write lock will block other read locks and write locks.

In MySQL, when a user modifies a certain part of the data, MySQL will lock it to prevent other users from reading or modifying the same data.

2.2 Lock granularity

The basic way to improve the concurrency of shared resources is to reduce the granularity of locks, make locked objects more selective, and try to lock only part of the data that needs to be modified. Under the premise of a given resource, the smaller the amount of locked data, the higher the concurrency of the system.

However, locking also consumes system resources. Various lock operations, including obtaining locks, checking locks, and releasing locks, will increase the overhead of the system. If the system spends a lot of time managing locks instead of processing data, the performance of the system will be greatly affected. At this time, it is necessary to find a balance between lock overhead and data security, and choose an appropriate lock strategy.

Most databases generally impose row-level locks on tables to provide better concurrency performance when resource competition is fierce. MySQL provides more flexibility. The storage engine can implement its own lock strategy and lock granularity to provide better performance for specific application scenarios.

2.2.1 Common lock strategies

Table lock

Table locks are MySQL's most basic locks and the least expensive locks . Before performing a write operation, you first need to obtain a write lock, which will lock the entire table and block all read and write operations of the table by other users. In the absence of write locks, read locks block each other step by step.

MySQL implements table locks on the Server layer. Although the storage engine can manage the locks by itself, when SQL statements such as ALTER TABLE are encountered, MySQL will directly use the table locks while ignoring the storage engine's lock mechanism.

Row lock

Row-level locks can support concurrent processing to the greatest extent, and also bring the greatest lock overhead. Row-level locks can only be implemented at the storage engine layer, and the server layer does not understand the lock mechanism in the storage engine at all. InnoDB implements row-level locking.

2.3 Deadlock

Deadlocks are multiple threads blocked waiting for the other are deadlocked thread of the lock held that the lock holding each other need, and ask them to lock. For example: thread 1 holds A lock and requests B lock, thread 2 holds B lock and requests A lock. Once a deadlock occurs, it will fall into an infinite loop. Unless external factors intervene, the deadlock cannot be resolved.

Four necessary conditions for deadlock :

  • Mutual exclusion: exclusivity, a resource can only be owned by one thread at a time.
  • Non-preemption: Before the resources are used up, other applicants cannot be forcibly deprived.
  • Request and hold : The process keeps the resources it has already obtained, requests other resources and blocks as a result.
  • Circular waiting: Several threads are thought of as a circular waiting resource relationship end to end.

Prevent deadlock (break 4 necessary conditions)

Allocate resources at one time, obtain all locks at one time-destroy request and hold

Release the occupied lock when the new lock is not requested-destruction cannot be preempted

In general database systems, various deadlock detection and deadlock timeout mechanisms are implemented. InnoDB can detect the cyclic dependency of the deadlock, and return an error immediately, and roll back the transaction that holds the fewest row-level locks.

3. Affairs

3.1 Transaction concept

A transaction refers to an operation that satisfies the ACID characteristics, and can safely transfer the database from one consistent state to another consistent state. Just as the upgrade of lock granularity will increase the system overhead, while the transaction guarantees data security, it also means greater overhead. The transaction is implemented by the storage engine, and the server does not manage the transaction.

Transaction system must support ACID

  1. Atomicity atomicity: The transaction is regarded as the smallest indivisible unit, and either all execution succeeds or all execution fails.
  2. Consistency: The transaction transfers the database from one consistency state to another consistency state. Before and after the transaction starts, the integrity constraints of the database are not destroyed.
  3. Isolation: Before the transaction is committed, it is invisible to other transactions, and the transactions are isolated from each other.
  4. Durability: After the transaction is committed, the result is permanent. Ensure the high reliability of the transaction, rather than high availability (hard disk damage).

Concurrency issues

  1. Lost update: Two transactions modify the same data, and the latter transaction overwrites the modification of the former. Solved by optimistic locking or pessimistic locking.
  2. Dirty read: Transaction A modifies a piece of data, but it is not committed, and is subsequently read by transaction B. If A undoes the modification, then B reads dirty data. ——Read uncommitted data
  3. Non-repeatable reading: A reads a piece of data, and then B modifies the data. At this time, when A reads the data again, the result is different from the first time. ——Read the updated data successively
  4. Phantom reading: A reads a certain range of data, and then B adds and deletes data in this range. At this time, when A reads the data in this range again, the result is different from the first time. ——Read the data before and after modification (insert, delete) in a certain range successively

Isolation level

SQL defines four isolation levels:

  1. Read Uncommitted: Even if the transaction is not committed, its modifications are visible to other transactions.
  2. Read Committed: Before a transaction is committed, its modifications are not visible to other transactions.
  3. Repeatable Read: In the same transaction, the result of reading the same data multiple times is the same.
  4. Serialable: Transactions are serialized and executed one by one, without interference between transactions.
Isolation level Dirty read Non-repeatable Phantom reading
Uncommitted read
Submit to read ×
Repeatable × ×
Serializable × × ×

But in InnoDB, the default transaction isolation level is repeatable read, but due to the adoption of Next-Key Lock, the phantom read problem can be solved, and the serializable isolation level is reached.

3.2 Transaction log

Most storage engines use transaction logs to improve transaction processing efficiency. When the storage engine modifies table data, it first only modifies its cache page, and does not perform a real-time flashing operation. And before modifying the cache, the transaction log is placed in an append write mode. This method is called WAL (Write-Ahead Logging), and the log comes first. Since sequential IO is much faster than random IO, transaction logs can greatly improve system performance.

Redo log in InnoDB is a type of transaction log.

4. MVCC multi-version concurrency control

Most of MySQL's transactional storage engines do not implement simple row-level locks. MVCC is generally used to improve concurrency performance. Not only MySQL, PostgreSQL, Oracle and other databases also implement MVCC.

MVCC can be considered as a variant of row-level locking, but in many cases the locking operation is avoided, so the overhead is smaller.

MVCC uses view snapshots to realize that the data seen during the entire transaction is consistent. Depending on the start time, the data seen by different transactions may be different.

reference

"45 Lectures on MySQL Actual Combat" Geek Time
"High Performance MySQL"
"Inside MySQL Technology (InnoDB Storage Engine)"

mind Mapping

Insert picture description here

Guess you like

Origin blog.csdn.net/cooper20/article/details/108632599