MySQL combat: How is data stored and queried?

Please add image description

MySQL stored and query procedures

When we execute a sql, we will go through the following process, and we can see that the data finally exists in the form of files.
insert image description here
When we create a student table in the database, the following files will be created in the file system.
insert image description here
These data will eventually persist into a file, how is this data organized in the file? Is it appending to the file line by line? In fact, it is not. The data is actually stored in pages. The size of a page is 16k. A table consists of many pages. These pages form a B+ tree.

A table is stored as follows

Lines of data form pages, pages form areas, and areas form segments for easy management
insert image description here
Pages: Pages are the basic unit of innodb disk management, and the size of each innodb page is 16k

Extent: Consists of 64 contiguous pages, each page is 16kb in size, that is, each extent is about 1MB

Segment: Common segments include data segment (B+ tree page node), index segment (B+ tree non-page node), rollback segment, etc.

Let's first look at how pages store data from a microscopic perspective.

The format of a page is as follows
insert image description here

name illustrate
file header Indicates the information of the page
Header Indicates the status information of the page
Min and Max records Two virtual records representing the minimum and maximum records in the page
User record Store row record content
free space Unused space in the page
page directory Index user records
end of file Check if the page is complete

Data will be continuously inserted into User Records, and when it is full, Free Space will cease to exist.

The records in the page are sorted according to the primary key value from small to large, forming a singly linked list

Some friends may ask, the table I built does not have a primary key, how should the data in the table be organized?

  1. First determine whether there is a non-null unique index in the table, if so, the column is the primary key
  2. If not, add a hidden column called row_id as the primary key

insert image description here
The pages are connected together by a double linked list
insert image description here
When we look for data in a page, do we need to traverse the linked list one by one?

Of course not. In order to improve the search rate, mysql will group the data and use the page directory to record the address of the largest record in each group.

How many pieces of data does each group have?

  1. The records in the first group can only have 1 record
  2. The number of records in the last group can only be between 1-8
  3. The number of records in the remaining groups can only be between 1-8

As shown in the figure below,
insert image description here
the blue part is the primary key and its corresponding data

When looking for records, first find the corresponding group through the page directory, and then traverse the linked list in the group

For example, I want to find user records whose primary key is 10, and the numbers of 5 slots are 0, 1, 2, 3, and 4. The search process is as follows

  1. First, the bit in the middle of the slot is (0 + 4) / 2 = 2. The maximum record in slot 2 is 8. We need to continue searching for records from the back of slot 2.
  2. The bit between slot 3 and slot 4 is (3 + 4) / 2 = 3, the maximum record in slot 3 is 12, 12 > 10, so the record to be searched is in slot 3
  3. But the linked list between user records is one-way, so we can go to the 8 records in slot 2 first, and then start traversing along the linked list until we find the target record

Although looking up data in a page is fast due to the existence of the data directory, will looking up data in a table still be slow? After all, it is necessary to traverse all the data pages along the linked list

Of course MySQL does not allow this to happen. Since we can create a directory for records to speed up the search speed, we can create a directory for pages to speed up the search speed. The format of the directory is the minimum primary key id in each page and its corresponding page number ( is the address of the page)

Directories are also stored in data pages, with a size of 16kb. So there may be multiple directories. When there are too many directories, we can also create directories for the directories. As shown in the picture below,
insert image description here
isn't this a tree? Leaf nodes store records, and non-leaf nodes store primary keys and corresponding page addresses. This tree is actually a B+ tree

As an example in the above figure, we query the data whose primary key is 5 in the table. The query process is as follows

First go to the root directory to check, then go to page 30 to check, then navigate to page 16, and finally find the record

Next, let's analyze the storage of data from a macro perspective, so that we can analyze the problem.

When using MyISAM to store data, the data and the index are separate, and the address of the corresponding record is stored in the B+ tree.
insert image description here
When using InnoDB to store data, the data and the index are together, that is, the clustered index. Of course, you can create a non-clustered index for a field
insert image description here
. The leaf nodes of the clustered index store user data, while the leaf nodes of the non-clustered index store the indexed column values ​​and their corresponding primary key values.

When querying records using the primary key, it is only necessary to traverse the clustered index. When using a non-clustered index to query data, first traverse the non-clustered index to find the primary key value of the record, and then traverse the clustered index according to the primary key value to obtain data, that is, return the table

Reference blog

[1]http://liuqh.icu/2019/04/03/db/innodb-storage/
[2]https://bbs.huaweicloud.com/blogs/317532

Guess you like

Origin blog.csdn.net/zzti_erlie/article/details/123754624