2020 Autumn Recruitment_Database (mysql) Learning Record


MySQL index and query optimization summary

index

An index is a structure for sorting the values ​​of one or more columns in a database table. Using an index can quickly access specific information in a database table.

Ordinary index and unique index

Both the unique index and the ordinary index use B-tree structures, and the execution time complexity is O(logn).

The only task of an ordinary index (an index defined by the keyword KEY or INDEX) is to speed up data access . Therefore, you should only create indexes for the data columns that appear most frequently in query conditions (WHEREcolumn=) or sort conditions (ORDERBYcolumn). Whenever possible, you should choose a data column with the most neat and compact data (such as an integer type data column) to create an index.

Ordinary indexes allow the indexed data column to contain duplicate values. If you can determine that a data column will only contain values ​​that are different from each other, you should use the keyword UNIQUE to define it as a unique index when creating an index for this data column.
The benefits of doing so: First, it simplifies MySQL's management of this index, and this index becomes more efficient. Second, MySQL will automatically check the value of this field in the new record when a new record is inserted into the data table. Has it already appeared in this field of a record; if it is, MySQL will refuse to insert that new record. In other words, the unique index can guarantee the uniqueness of data records. In fact, in many occasions, the purpose of creating a unique index is often not to improve access speed, but to avoid duplication of data .

constraint

The difference between primary key constraint and unique constraint

Both primary key constraints and unique constraints will create unique indexes. The difference is that the index key of the primary key constraint (unique index) is not allowed to be NULL in definition, and the index key of the unique constraint (unique index) is allowed to be NULL in definition.

Primary key constraint Unique constraint
The column does not allow null values ​​(NULL) The column in which it is located allows null values ​​(NULL)
There can only be one primary key constraint on a table There can be multiple unique constraints or one

The difference between primary key, foreign key and index

The role of SQL's primary key and foreign key

  • The primary key is a unique identifier that can determine a record. For example, a record includes ID number, name, and age. The ID number is the only one that identifies you. Others may be repeated, so the ID number is the primary key.
  • The foreign key is used to associate with another table. It is a field that can confirm records of another table and is used to maintain data consistency. For example, if a field in table A is the primary key of table B, it can be a foreign key of table A.

Definition: Primary key: uniquely identifies a record, cannot be repeated, and cannot be empty.
Foreign key: The foreign key of a table is the primary key of another table. The foreign key can be repeated and can be null.
Index: The field has no duplicate values, but it can have a null value.

Role:
Primary key: used to ensure data integrity
Foreign key: used to establish contact with other tables
Index: used to improve the speed of query sorting

Number:
Primary key: There can only be one primary key.
Foreign keys: A table can have multiple foreign keys.
Index: A table can have multiple unique indexes.

Three database design paradigms

Three database paradigms (important)
The simplest explanation of the three database paradigms

  1. The first normal form: the atomicity of fields (columns) is indivisible.
    The first paradigm is the most basic requirement of all relational databases. The first paradigm requires that the database tables be two-dimensional, not three-dimensional (that is, each field cannot be split).

  2. Second normal form: On the premise of satisfying the first normal form. Every field except the primary key must be completely dependent on the primary key field.

  3. Third paradigm: On the premise of satisfying the second paradigm. Non-primary key fields cannot depend on each other. Each column has a direct relationship with the primary key, and there is no transitive dependency.
    The second paradigm and the third paradigm are mainly to avoid data redundancy, insertion exceptions, and deletion exceptions. Generally, split the table so that the multiple tables after the split meet the second and third normal forms.
    First normal form: Examples that do not meet the first normal form:

    表:字段1、 字段2(字段2.1、字段2.2)、字段3 ......
    

Second normal form: Examples that do not meet the second normal form:

    表:学号、课程号、姓名、学分;

    这个表明显说明了两个事务:学生信息, 课程信息;由于非主键字段必须依赖主键,这里学分依赖课程号,姓名依赖与学号,所以不符合二范式。

Third Normal Form: Examples that do not meet the third normal form:

     表:学号、姓名、 年龄、 所在学院、学院联系电话、学院联系电话

      存在依赖传递: (学号) → (所在学院) → (学院地点, 学院电话)

Inquire

The difference between b tree and b+ tree

Why do file systems and databases use b/b+ trees for indexing?

Database data is generally stored in the disk, and the I/O of the data in the disk is more time consuming than the I/O of the data in the memory, so the number of disk I/O should be reduced.

The number of I/O depends on the height of the tree. Compared with the red-black tree, the node of the b/b+ tree can have multiple children (the maximum number of children of the node is called the order of the B-tree), making the height of the tree much smaller than the red-black tree, thereby reducing the number of I/Os.

The operation time on the B/B+ tree is usually composed of the time of accessing the disk and the calculation time of the CPU. The CPU is very fast, so the operation efficiency of the B-tree depends on the number of times the disk is accessed. The total number of keywords is the same. The smaller the height of the lower B-tree, the less time it takes for disk I/O.

Why is B+ tree more suitable for database indexing than B tree?

B+ tree is a kind of B-tree deformed tree generated by the file system (the directory level and level index of the file, only the bottom leaf node (file) saves the data) non-leaf nodes only save the index, not the actual The data is stored in the leaf nodes.
All non-leaf nodes can be regarded as index parts!

Why mysql uses B+ tree

  • Why not use hash -> unordered
    Hash is faster than a tree regardless of read or write, so why should a tree structure be used for the index structure? Because for grouping, sorting, and comparison, the time complexity of the hash index will degrade to O(n), and this type of query will often appear in actual business .

  • Why not use a binary tree -> the number of
    nodes is high, the height of the tree is high. Each node of the binary tree only has two forks, and each node can only store one record. As the amount of data increases, the height of the tree will increase significantly, and the height of the tree The higher the value, the slower the query speed . In the B-tree, each node can have multiple forks and can store multiple records, so the height of the tree is reduced, and it can give full play to the principle of locality. The so-called locality principle is to use the data near the query data with high probability. This principle is based on disk read-ahead, which can reduce disk IO.

  • Why not use B-tree -> Each node stores data? Queries need to traverse the
    B-tree in each layer. The number of nodes is very large, and the number of layers is very small. Compared with the binary tree, the number of disk IOs is reduced, but each node stores data. , The query needs to be traversed in order, which is not the best way to locate data quickly .

  • Why use B+ tree instead of B tree.
    B+ tree is improved on the basis of B tree. The data is only stored on the leaf nodes, and a linked list is added between the leaf nodes. This way, when obtaining nodes, no in-order traversal is required, which is convenient for fast Locating data is the best way to reduce disk IO .
    b+tree
    Why does MySQL database index choose to use B+ tree?
    PS: I saw someone say this on Zhihu, and I feel that it makes sense:
    They think that the main reason why the database index uses B+ tree is: B tree does not solve the element while improving IO performance In order to solve the problem of inefficient traversal, the B+ tree application was born. The B+ tree only needs to traverse the leaf nodes to traverse the entire tree. Moreover , range-based queries in the database are very frequent, and the B-tree does not support such operations or is too inefficient .

Affairs

The so-called transaction is to control a set of operations (consisting of one or more SQL statements) of the database . If any of these statements cannot be executed, then none of the statements will be executed. In other words, the statements in the transaction are either all executed or not executed (atomic).

The process of using transactions in mysql database

  • Open strart transaction;affairs, begin;, , set autocommit=1;(the default).
  • Control transaction, the default is autocommit, that is, autocommit=1; turn off the autocommit select @@autocommit;or set autocommit=0;, after turning off the autocommit, you can perform a rollback operation, that is, rollback.
  • Commit the transaction manually commit;, once the transaction is committed, it cannot be rolled back (persistence).
  • Manual roll back the rollback;transaction .

Transaction characteristics ACID

Transaction has a very strict definition, it must meet 4 characteristics at the same time, namely: Atomicity (Atomicity), consistency (Consistency), isolation (Isolation), durability (Durability), referred to as the ACID standard for transactions.

  • Atomicity A : The transaction must be regarded as an indivisible minimum unit of work. Only when all database operations in the transaction are executed successfully can the entire transaction be considered successful. If any SQL statement in the transaction fails to execute, the successfully executed SQL statement must also be cancelled, and the state of the database returns to the state before the transaction was executed.
    For example: Suppose user A transfers 100 yuan to user B. The transaction requires two SQL statements to be executed, that is, user A deducts 100 yuan and user B increases 100 yuan. These two statements must succeed or fail at the same time, otherwise the data will be inconsistent. .

  • Consistency C : Consistency means that the transaction must change the database from one consistent state to another consistent state, that is, a transaction must be in a consistent state before and after execution.
    Consistency is more inclined to the consistency of the database state before and after the transaction is executed, and the atomicity is inclined to the indivisibility of the transaction. It can be considered that the consistency of the transaction is guaranteed by the atomicity.
    For example: suppose that the sum of the money of user A and user B is 3000, then no matter how the money is transferred between A and B, how many transfers, the sum of the money of the two users should still be 3000 after the transaction is over.

  • Isolation I : The transaction opened by the database for each user cannot be disturbed by the operation data of other transactions, and multiple concurrent transactions must be isolated from each other.
    For example: when multiple users operate the same table, the transaction opened by the database for each user cannot be interfered by the operations of other transactions, and multiple concurrent transactions must be isolated from each other.
    For example: for any two concurrent transactions T1 and T2. From the perspective of transaction T1: T2 either ends before T1 starts, or T2 starts after T1 ends. In other words: each transaction does not feel that other transactions are executing concurrently.

  • Persistence : Once the transaction is committed, the changes made will be permanently saved in the database, even if the database fails, it should not have any impact on it. It should be noted that the durability of a transaction cannot be 100% durable, and it can only be guaranteed from the perspective of the transaction itself, and some external reasons cause the database to fail, such as the hard disk damage, then the submitted data may be lost .

Isolation level of mysql transaction

MySQL database practical operation tutorial (18)-database transaction and its isolation level -> gave many good examples!

Why do transactions need to set the isolation level

Under normal circumstances, the database is accessed concurrently by multiple threads, so it is easy to have multiple threads open transactions at the same time. In this case, dirty reads, repeated reads, and phantom reads may occur. In order to avoid this situation, it is necessary to set the isolation level for the transaction.

Four isolation levels of transactions

  • READ UNCOMMITTED, read uncommitted (dirty read exists), should be avoided in actual development
  • READ COMMITTED, the read has been committed (dirty reads can be avoided, but reads cannot be repeated)
  • REPEATABLE READ, repeatable read (solve non-repeatable read), mysql default isolation level
    PS: But in theory, phantom reads will occur at this level. However, MySQL's storage engine solves this problem through the multi-version concurrency control (MVCC) mechanism, so this level can avoid phantom reading . mysql is at the RR level, Innodb uses MVCC and next-key locks to solve phantom reads.
  • SERIALIZABLE, serializable
    PS: The highest isolation level of the transaction, it will force the transaction to be sorted so that it will not conflict, thereby solving the problems of dirty reads, phantom reads, and repeated reads. However, this level may cause a lot of timeouts and lock contention, and is rarely used in actual applications.

View and modify operations

View the database isolation level (mysql 8.0 version):
System level select @@global.transaction_isolation;
session level select @@transaction_isolation;or select @@session.transaction_isolation;
modify the isolation level:
set <global/session> transaction isolation level <READ UNCOMMITTED/READ COMMITTED/REPEATABLE READ/SERIALIZABLE>

Dirty read

One transaction reads data that was not committed by another transaction.

Read the uncommitted data of thread A in the transaction operation in thread B! ! At this point, the transaction of thread A can use the rollback ROLLBACK to undo the previous operation! This is quite dangerous! This is like: Thread A is the customer, and Thread B is the seller; because the dirty reading seller mistakenly thought that he had received the customer's money and immediately shipped the goods, but the customer cancelled the previous transfer by rolling back.

Non-repeatable

Non-repeatable read is the inconsistent read results when the data submitted by other threads are repeatedly read in a transaction . The reason for this problem is that other transactions have done update operations during the query process . For example, when the bank makes a statistical report, the first query of the account number 9527 has 1,000 yuan, and the second query of the 9527 account has 900 yuan; the reason is that the owner of the 9527 account took it out during the statistical period when the bank made the statement 100 yuan, this will lead to inconsistent results of multiple statistical reports.
Non-repeatable reads are similar to dirty reads, but dirty reads read the uncommitted dirty data of other threads' transactions. Non-repeatable reads are situations where the data submitted by other threads are repeatedly read within the transaction but the data is not consistent.

Phantom reading

Phantom read refers to the inconsistent number of data rows in two queries within a transaction . The reason for this problem is that other transactions have done adding operations during the query process . For example, when the bank counts the total amount of all users in the account table when making a statistical report, there are three accounts with a total amount of 3000. At this time, a new account was added and 1,000 yuan was deposited; in this case, the bank found that the total amount of the account had become 4,000, which caused a phantom reading.

MVCC mechanism of mysql

Introduction

The principle and implementation of MySQL InnoDB MVCC mechanism and implementation of
MVCC (Multiversion Concurrency Control), multi-version concurrency control, is a common means of handling read-write conflicts in the implementation of modern databases (including MySQL, Oracle, PostgreSQL, etc.) engines , with the purpose of improving high database concurrency The throughput performance under the scene.

The opposite of MVCC is Lock-Based Concurrency Control. The biggest advantage of MVCC: read without lock, read and write without conflict. In OLTP (online transaction processing) applications with more reads and less writes, it is very important that reads and writes do not conflict, which greatly increases the concurrent performance of the system.

  • InnoDB engine in MySQL supports MVCC;
  • To deal with high concurrent transactions, MVCC is more effective than simple row locks and has lower overhead;
  • MVCC works under the isolation levels of Read Committed and Repeatable Read ;
  • MVCC can be implemented based on both optimistic and pessimistic locking.

PS: The realization of pessimistic locking is locking; there are two main realizations of optimistic locking: CAS mechanism (atomic operation) and version number mechanism.

Implementation principle

MYSQL MVCC realization principle
MVCC is realized by saving two hidden columns behind each row record. Of these two columns, one holds the creation time of the row, and the other holds the expiration time (or deletion time) of the row. Of course, what is stored is not the actual time value, but the system version number (system version number) . Every time a new transaction is started, the system version number is automatically incremented. The system version number at the beginning of the transaction will be used as the version number of the transaction to be compared with the version number of each row of the query .

Summary: MVCC (Multiversion concurrency control) is a way to temporarily retain multiple versions of the same data to achieve concurrency control.

RR isolation level solves the phantom read in the current read and snapshot read situation

Current Read and Snapshot Read -> Interpretation of the difference between snapshot and current read.
In a concurrent system that supports MVCC, we need to support two kinds of reads, one is snapshot read and the other is current read.

  • Snapshot read: simple select operation. What is read is the visible version of the recorded data (may be out-of-date data) without locking.
  • Current read: special read operations, insert/update/delete operations. What is read is the latest version of the record data, and the record returned by the current read will be locked to ensure that other transactions will not modify this record concurrently.

At the RR (read repeatable) level, snapshot read is achieved through MVVC (multi-version control) and undo log, and current read is achieved by adding record lock (record lock) and gap lock (gap lock).

How does InnoDB solve phantom reading?
Conclusion: Under the isolation level of RR, Innodb uses MVCC and next-key locks to solve phantom reads. MVCC solves phantom reads for ordinary reads (snapshot reads), and next-key locks solves phantom reads under current read conditions.

PS: In RC mode, MVCC cannot solve phantom reads and non-repeatable reads, because each read will read its own refreshed snapshot version. Simply put, another transaction commits, and it refreshes once to read the latest.

mysql engine and difference

mysql engine

The database engine is the core service used to store, process and protect data . The database engine can be used to control access rights and quickly process transactions, thereby meeting the requirements of most applications that need to process large amounts of data in the enterprise. Use the database engine to create a relational database for online transaction processing or online analysis and processing of data. This includes creating tables for storing data and database objects (such as indexes, views, and stored procedures) for viewing, managing, and protecting data security.

The most commonly used mysql engines are InnoDB and MyISAM.

The difference between InnoDB and MyISAM

mysql database engine switch (InnoDB, MyISAM)
The mysql version 5.7 installed in the system is show engines;used to view the mysql engine. You can see that the engine supported by default is InnoDB, which features: Supports transactions (transactions), row-level locking (row locks), and foreign keys (foreign keys).
update, insert, delete
Mysql distributed transaction XA (cross-database transaction) MySQL XA introduction
savepoint is a method of implementing "subtransactions" in database transaction processing, also known as nested transactions. The transaction can be rolled back to the savepoint without affecting the changes before the savepoint was created. There is no need to abandon the entire transaction. MySQL basics-the application of SAVEPOINT
mysql engine
The difference between InnoDB and MyISAM

mysql lock

The interviewer asked me if I knew the MySQL lock, and the next 15 minutes made him admire

  • Table lock is the largest granularity of mysql lock, which means that the current operation locks the entire table, the resource cost is less than row lock, and there will be no deadlock, but the probability of lock conflict is high.
  • Row locks are the smallest granular locks in mysql locks. Because the lock granularity is very small, the probability of resource contention is also the smallest, and the concurrent performance is the largest, but it will also cause deadlocks. Each lock is locked and released. The overhead will also increase.

Row locks are also divided into shared locks (S locks or read locks) and exclusive locks (X locks or write locks) according to their usage.

  • Shared lock

Instructions for use: If transaction A adds an S lock to data object 1, transaction A can read data object 1 but cannot modify it. Other transactions can only add S lock to data object 1, but cannot add X lock until transaction A is released. S lock on data object 1. This ensures that other transactions can read data object 1, but cannot make any changes to data object 1 before transaction A releases the S lock on data object 1.
Usage: select… lock in share mode; a
shared lock means that multiple transactions can share a lock for the same data, and can access the data, but can only read it but not modify it.

  • Exclusive lock

Instructions for use: if transaction A adds X lock to data object 1, transaction A can read data object 1 or modify data object 1, and other transactions can no longer lock data object 1 until transaction A releases data object 1 lock. This ensures that other transactions can no longer read and modify data object 1 before transaction A releases the lock on data object 1.
select… for update;
Exclusive locks cannot coexist with other locks. For example, if a transaction acquires an exclusive lock on a data row, other transactions can no longer acquire other locks on the row.

redis database

The difference between redis and mysql

Guess you like

Origin blog.csdn.net/XindaBlack/article/details/106668933