Proficient in MySQL architecture

  • Lao Liu is a graduate student who is about to find a job. He taught himself big data development. He was deeply moved along the way. The information on online big data is mixed, so he wanted to write a detailed big data development guide. This guide describes the [basic knowledge] [framework analysis] [source code understanding] of big data in your own words, so that partners can learn by themselves and never ask for help.
  • The big data development guide address is as follows:
    • github:https://github.com/BigDataLaoLiu/BigDataGuide
    • Code Cloud: https://gitee.com/BigDataLiu/BigDataGuide
  • Your likes are my motivation to continue to update. Prostitutes are forbidden. If you read it, you will gain something. If you need to contact the public account: Working Lao Liu.

Today I will share with you the first article of MySQL, the basic part of big data development. Old Liu talked about something different from others! Many partners know the basic knowledge and use of MySQL, but don't know much about the principles inside. It is absolutely impossible for us to learn the knowledge only from the surface, so Lao Liu strives to explain the MySQL architecture knowledge to everyone!

The outline of MySQL Architecture is as follows:

After reading this content of Lao Liu, I hope you can master the following:

  1. Mysql components and their functions
  2. Mysql simplified version execution process and detailed execution process
  3. The difference between MyIsam and InnoDB and the usage scenarios
  4. The concept and related functions of Mysql log files

1. Logical architecture

Logical architecture diagram

First, share the MySQL logical architecture diagram. We can see that MySQL is composed of many modules, and each module plays an important role. The concepts and functions of each module are introduced below.

Connector

Connectors, it refers to interacting with SQL in different languages.

System management and control tools

Its role is to back up the cluster and cluster management.

connection pool

Manage connections, perform authorization verification and the like.

SQL interface

After receiving SQL commands (such as DDL, DML), return the results that users need to query. But after receiving the SQL command, we need to turn it into meaningful SQL. To be recognized by the system what you want to do with this SQL, you need to parse the SQL statement, so you need a Parser parser.

Parser

The analysis is divided into lexical analysis and grammatical analysis, and examples are given to illustrate lexical analysis and grammatical analysis.

After the SQL command is passed to the parser, it will be verified and parsed by the parser. The lexical analysis is performed first, and the word segmentation forms select, *, from, t1. After the parsing is completed, a syntax tree is formed. After the syntax analysis is performed, the SQL statement is analyzed. If it is not right, the SQL statement is unreasonable.

Query optimizer

After the syntax is correct in the previous step, it will be passed to this part. Before the SQL statement is actually executed, MySQL will think that your statement is not optimal, and it will optimize it. The SQL statement execution plan viewed with explain is generated by the query optimizer!

例如:select * from tuser where name like 'a%' and id = 1;
这句话就会进行优化,至于为什么会优化,后面会讲到,先知道就行,会变为这样的语句。
     select * from tuser where name id = 1 and like 'a%';

Query cache

Store the results of the query, but it is not for the SQL statement, but the hashed value of the SQL statement. If there is the same query result next time, it will not go to the Pluggable Shortage Engines storage engine, and the query result will be taken out directly in the cache. (Now it is not very useful, it has been removed in the new MySQL, no need)

Storage engine

Pluggable storage engine, that is, MySQL database provides a variety of storage engines. It is used to store data, how to index and update the stored data.

In MySQL, there are two main storage engines: MyIsam and InnoDB.

MyIsam is a high-speed engine with high insertion and query speed. But does not support transactions, row locks, etc.;

InnoDB is MySQL's default database after version 5.5. It supports transaction and row-level locking, transaction processing, rollback, crash repair capabilities, and multi-version concurrency control. It is slightly slower than MyIsam in processing speed and supports foreign keys.

So how do we choose the storage engine type?

InnoDB: supports transaction processing, foreign keys, crash repair capabilities and concurrency control. If you need to have relatively high requirements for transaction integrity (such as banks) and require concurrency control (such as ticket sales), InnoDB will generally be chosen. If you need to update and delete databases frequently, you can also choose InnoDB because it supports transaction commit and rollback.

MyIsam: Inserting data is fast, and space and memory usage are relatively low. If the table is mainly used to insert new records and read out records, then choosing MyIsam can achieve high processing efficiency.

The following Lao Liu puts a picture of the difference between MyIsam and InnoDB:

Simplified execution flow chart

How to remember the execution flowchart? Remember the execution order of each module in the logical architecture diagram!

  1. Client : Send commands to the connector, and the connector will perform permission verification. After the permission is verified, the client can continue to send SQL commands.
  2. Connector : Responsible for establishing a connection with the client and obtaining permissions.
    • If the username or password is incorrect, you will receive an "Access denied for user" error.
    • If the user name and password are passed, the connector will enter the permission table to find the permissions you have.
  3. Query cache : After the connection is established, you can execute the select statement, and the execution logic comes to the second step: query cache. If the result has been cached before, it will return directly.
  4. Analyzer : If there is no hit in the query cache, the statement must be executed, lexical analysis first, and then syntax analysis.
  5. Optimizer : After the analyzer, MySQL knows what you want to do. Before starting execution, it must be processed by the optimizer. The optimizer decides which index to use when there are multiple indexes in the table.
  6. Executor : Know what you want to do through the analyzer, know what to do through the optimizer, so now enter the executor and start executing the statement. Note: When starting to execute, you must first judge whether you have execute permission on this table, continue executing if you have permission, and return if you don’t have permission. If you have permission to open the table to continue execution, the executor will use the interface provided by the engine according to the engine definition of the table.

Detailed execution flow chart

After talking about the short version of the execution flow chart, it feels almost enough. But when I was studying, there was a detailed execution flow chart, and Lao Liu also talked about the process.

  1. After MySQL is started, the network interaction module will be connected in the connection management module, etc. After the connection is up, it will enter the connection process module, and then to the user module, to see if you have user permissions, if the permissions are passed, the information will be returned to the connection management module , You can log in.
  2. Next, the MySQL statement is sent to the user module. The user module also checks whether you have the authority to operate the table. If you have the authority, it will be sent to the command distributor and then sent to the query cache module. If you have checked it before, return the result directly (at the same time the command arrives at the command distributor, after the command comes down, go to the logging module and record the log).
  3. Then the command arrives at the command parser to see what statement it is. According to different types of statements, it enters the optimizer of different modules. The types of optimizers are: query optimizer, table change module, table maintenance module, copy module, status module .
  4. The SQL statement now arrives at the access control module. Check again to see if you have permission to see if you have operation permission (insert permission, update permission, etc.). If there is no problem with this permission, it will enter the table management module and call the storage engine interface. , And then after the adjustment, the storage engine takes the data down (that is, takes the data in the file system), and then back.

At this point, the logical architecture diagram is over, and everyone can work through the ideas and logic, and you can remember them in one go.

2. Physical structure

MySQL can be divided into log files and data index files in terms of physical structure. Its data index files and log files in Linux are all in the /var/lib/mysql directory, and log files are stored in sequential IO mode, while data files Use random IO for storage.

Here is a question: Why are log files stored using sequential IO and data files are stored using random IO?

  • First, let's briefly talk about sequential IO and random IO. Sequential IO is physically a continuous storage space, and it is very efficient when adding content sequentially. Random IO is logically continuous, but not physically continuous. When operating on content, you need to find the location of the file on the disk every time.
  • Lao Liu simply said that the advantage of sequential IO storage is that the recording speed is fast and the data can only be appended. This is especially suitable for log files, because the characteristics of log files are also very obvious. It records log information and does not need to modify data. Disadvantages It's a waste of space. Data files may often need to be modified. The storage addresses are not continuous. This is especially suitable for random IO, and random IO saves space, but the speed is a bit slow

Log file

The following starts to introduce each log in the log file, only those

Error log (errorlog)

It is enabled by default and records all serious error messages encountered during each run, as well as detailed information about each startup and shutdown of MySQL.

Binary log (binlog)

This is too important, everyone must remember!

The default is off, it records all DDL statements and DML statements in the database, but does not include the content of the select statement. DDL statements are directly recorded in binlog, and DML statements must be submitted to binlog through transaction submission. It is mainly used to implement mysql master-slave replication, data backup, data recovery,

General query log

It is closed by default, it will record all the user's operations, which also contains information such as additions, deletions, modifications, and other information. In the case of large concurrent operations, a large amount of information will be generated, resulting in unnecessary disk IO, which will affect mysql performance.

Slow query log

It is disabled by default. It records all queries whose execution time exceeds long_query_time seconds, and collects SQL statements with a relatively long query time, which can be used to improve query performance.

Redo log

It is mainly used to ensure the durability of the transaction. To prevent dirty pages from being written to the disk at the point of failure, when the mysql service is restarted, redo according to the redo log, so as to achieve transaction durability.

Rollback log (undo log)

It saves a version of the data before the transaction occurs, can be used for rollback, and can provide multi-version concurrent control read (MVCC).

Relay log

Regarding this, Lao Liu knows that it is useful in two places, one is mysql master-slave replication, and the other is canal to synchronize mysql incremental data. The main purpose is to read the binary log of the master server from the server I/O thread and record it to the local file of the slave server. Then the slave server SQL thread will read the content of the relay-log log and apply it to the slave server, so that the slave server and the master The server data remains consistent.

data file

InnoDB data file

  • .frm file: mainly stores the data information related to the table, mainly including the definition information of the table structure.
  • .ibd file: Use table exclusive table space to store table data and index information, one table corresponds to one ibd file.
  • .bdata file: use a shared table space to store table data and index information, all tables share one or more ibdata files

MyIsam data file

  • .frm file: mainly stores the data information related to the table, mainly including the definition information of the table structure.
  • .myd file: mainly used to store table data information.
  • .myi file: mainly used to store any indexed data tree in the table data file.

to sum up

This article, as the first article of the big data development guide MySQL, introduces the content of MySQL architecture in detail, and introduces the various modules and processes in detail. I hope everyone can follow Lao Liu’s article and work out their ideas, and strive to be able to use this knowledge in their own words Click to tell!

Although the current level may not be as good as the bosses, Lao Liu will work hard to become better, so that all friends will learn by themselves and never ask for help!

If you have any related questions, please contact the official account: Lao Liu who works hard. I have seen this in the article, like it, follow it and support it!

Guess you like

Origin blog.csdn.net/qq_36780184/article/details/113176728