MySQL index structure evolution history
what is index
Index definition: Index relies on certain data structures and algorithms to organize data, and ultimately guides users to quickly retrieve the required data.
For example, in Xinhua Dictionary, we can quickly find the word we need to find through radicals or pinyin; the radicals and pinyin here are the indexes
Index selection data structure history
1.Ordered array
advantage:
Data can be accessed randomly via subscripts
shortcoming:
When searching for data, the entire table data needs to be loaded into the memory, which causes very high memory pressure.
And the pointer movement problem needs to be taken into consideration when storing data.
2. Linked list
advantage:
- Can quickly locate the previous or next node
- You can quickly delete data by just changing the pointer. This is better than an array.
shortcoming:
- Data cannot be accessed randomly through subscripts like an array.
- To find data, you need to start traversing from the first node, which is not conducive to data search. The search time is similar to that without data. It requires a full traversal. The worst time is O(N)
3. Binary search tree
Advantages and disadvantages of binary trees:
- The efficiency of querying data is unstable. If the left and right sides of the tree are relatively balanced, the worst case is O(logN). If the inserted data is in order, it degenerates into a linked list, and the query time becomes O(N).
- When the amount of data is large, the height of the tree will become higher. If each node corresponds to a block on the disk to store a piece of data, the number of IO times required will increase significantly. Obviously, it is not advisable to use this structure to store data.
normal data
abnormal data
4. Balanced binary tree (AVL tree)
The balanced binary tree is a special kind of binary tree, so it also satisfies the two characteristics of the binary search tree mentioned earlier, and it also has another characteristic:
The absolute value of the height difference between its left and right subtrees does not exceed 1, and both left and right subtrees are balanced binary trees.
Compared with a binary tree, a balanced binary tree has a relatively balanced left and right side of the tree and will not degenerate into a linked list like a binary tree. No matter how data is inserted, through some adjustments, the height difference between the left and right sides of the tree can be ensured to be no more than 1.
But when the amount of data is very large, the problem of the tree height being too high will also occur like the binary tree.
5.B-tree
Derived from the balanced binary tree, each node stores multiple elements, and multiple elements in the nodes are related through pointers, which solves the problem of the tree height being too high when the amount of data is large;
However, the range search problem cannot be solved. For example, searching for [15,36] still requires access to 7 disk blocks (1/2/7/3/8/4/9)
6.b+tree
After optimization, only data is stored in leaf nodes, other nodes only store keywords, and leaf nodes are related through bidirectional pointers.
Fixed range lookup issue
Search process
First locate the maximum and minimum values of the range, and then rely on the linked list to traverse the range data in the child nodes.