(16) the role of programmer's algorithm class -B + tree database index

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/m0_37609579/article/details/100107832

Previously talked about binary and multiple trees, good performance binary tree, like AVL tree, red-black trees are very good structure, then the database index, does not use such a binary tree structure, which is why? Because trees do have better performance search! Currently most database systems and file systems use B-Tree or a variant thereof as a B + Tree index structure.

A, B- and B + Tree Tree Review

1.B- tree 

Tree-B (multipath search tree) is a common data structure. Using the B-tree structure can be reduced significantly when the intermediate process of location history experienced, thus speeding up the access speed. According to the translation, B Balance is usually considered to be short. This data structure is generally used index database, higher overall efficiency.

B- tree each node storing key and data, that all the nodes in the tree, and the leaf node pointer is null.

B- tree features: 

  1. Root node has at least two children 
  2. Each non-root nodes have [, M] a child; 
  3. Each non-root node has a [-1, M-1] keyword, and arranged in ascending order 
  4. key [i] and key [i + 1] is the value of the child nodes of between key [i], key [i + 1] between 
  5. All leaf nodes are at the same level

B- tree advantages:

B-Tree advantage is that multiple look, this is better than the red-black tree specific reasons, we think, B- tree each node has multiple key, while the red-black tree each node has a key, so with the growing number of data, red-black tree height increasing efficiency continue to decrease, while the height of the B-tree is generally low, particularly at night? because a B-tree node can only put the N key ,, are full It was divided once! Why B-tree will split it? Because with the increase of data, key nodes of a full, in order to maintain the characteristics of B-tree, it will have split, just like red-black trees and AVL trees in order to maintain the nature of the tree is the same need to rotate the same!

2.B + tree

B + Tree is a variant B- tree, is also a multipath search tree, which is defined substantially the same B-Tree. Address B + tree leaf node and the corresponding keywords stored records, the layers above the leaf nodes used as an index.

è¿éåå¾çæè¿ °

B + trees: Data storage only leaf nodes, the leaf nodes contain all of the key tree, the leaf node does not store a pointer.

B + tree features:

  1. All keywords appear in the list of leaf nodes (dense index), and the list of keywords happens to be ordered;
  2. In the non-leaf nodes can not hit;
  3. Non-leaf node corresponds to a leaf node of an index (index sparse), leaf nodes corresponding to the data storage layer (keyword) data;
  4. More suitable for document indexing system;

B- and B + Tree Tree different:

  1. The upper limit of each node pointer not 2d 2d + 1;
  2. The node does not store data, store only key, i.e. all keywords appears in the leaf node ;
  3. Leaf node does not store a pointer, but increase a chain pointer for all leaf nodes .

B + tree advantages:

B + tree in the sequential access pointer increases, i.e. each leaf node to add a node to the adjacent leaf pointer, so that a tree would be implemented database system of the preferred data structure of the index. There are many reasons, the most important is the tree chunky, generally speaking, the index is large, often stored in the form of index files on disk, resulting in disk I / O consumption when the index to find, with respect to the memory access, I / O access to high consumption of several orders of magnitude, so the evaluation of the merits of a data structure as an index of the most important indicator is the time to find the complexity of the process of disk I / O operations number. The smaller height of the tree, I / O times fewer. That is why the B + tree instead of a B-tree, it is because it is not stored within the node data, so that one node can store more key.

Second, what is the decisive factor index search speed is?

Because most of the data in the database is stored on the disk above, in general, the index itself is also great, the disk can not all be stored in memory, so the index is often stored in the form of an index file. In this case, index lookup process will produce a disk I / O consumption, with respect to the memory access, I / O access to high consumption of several orders of magnitude, so the evaluation of the merits of a data structure as an index of the most important indicator is the in the process of finding disk I / O operations complexity. The smaller height of the tree, I / O times fewer. In other words, the structural organization of the index to minimize the number of accesses lookup process disk I / O's.

The height of the binary tree too deep the IO multiple disk, resulting in low efficiency of the query, the B-tree and B + tree contains up to m each child node, the relative binary tree, the tree height B- and B + tree is relatively low, it is short and fat!

Three, MySQL storage engine in

In MySQL, the two most commonly used storage engine is MyISAM and InnoDB, MySQL are two generations of search engines.

Different implementations of the index thereof, MyISAM Data storage address of the data, and index data separately . InnoDB data stored the data itself, but also the index data.

Main and secondary sub-index index index: generally called the primary key index main index, other keys in the index is called a secondary index.

Four, MyISAM implemented using a B + tree

Main Index:

è¿éåå¾çæè¿ °

Can be seen from FIG, col1 is the primary key, and the data stored in the leaf node is an address, the address data is found.

Secondary index (index different primary and secondary indexes are key can be repeated):

è¿éåå¾çæè¿ °

Five, InnoDB implemented using a B + tree

Main Index:

è¿éåå¾çæè¿ °

Note that, the difference is stored MyISAM and data fields are all leaf node data.

Secondary indexes:

è¿éåå¾çæè¿ °

A closer look at the difference between the secondary index and the main index, the preservation of the leaf nodes of the secondary index is the primary key; this is the biggest difference between MyISAM and InnoDB.

Six, InnoDB than MyISAM in the end was good?

Since MySQL MyISAM and InnoDB are two generations of engines, there will certainly be a lift, and InnoDB is the latest generation, then it is excellent in the end where?

Imagine, MyISAM and InnoDB to a B + tree are implemented on the basis of the B-tree with respect to different actually been mentioned that the data field and a node that is separated;

The MyISAM is the index and file separate data field leaf nodes of the B + tree is the address of the file content, the main index and secondary index B + trees are so, so if I change my address, is not all index tree had to change, as previously we speak frequently on the disk read and write operations are inefficient, but without this principle applicable local, because logically adjacent nodes, not necessarily physically adjacent, then this will result in lowering the efficiency;

Ever since, InnoDB arises, it makes data field in addition to the main index of the leaf nodes of secondary indexes are stored primary key, first find the primary key through secondary indexes, and then find all the data leaf nodes of the primary key, it sounds looks like a very trouble traversing the two trees, however, so if they have to modify it, changing only the main index, other auxiliary microprinting do not move, and, key for each node of the tree is not in the database we give so little Imagine if a node has 1024 key, then the height of 2 B + tree has 1024 * 1024 key, so the tree height is generally very low, so the tree traversal consumes almost negligible!

Seven summary

1. Why B + tree?

  • File is large, not all stored in memory, it is to be stored on disk 
  • Structural organization of the index to minimize the number of accesses to find the process of disk I / O's (Why B - / + Tree, was kind enough disk access principle. For details, see analysis below) 
  • Disk read-ahead principle of locality, the length is generally pre-read page (page) is integral multiple (in many operating systems, the page size is typically obtained 4k) 
  • Clever use of the database system disk read-ahead principle, a node is set equal to the size of a page, so that each node requires only one I / O can be completely loaded (due node has two arrays, the address continuous) . The red-black tree this structure, significantly deeper and more h. Because of the close logical node (parent-child) may be physically far away, it can not be used locally.

2. Why B + tree is more suitable than the B-tree index?

B + tree disk reads less costly 

B + internal node pointer is not specific keyword information, i.e., data is not stored inside the node. So the internal node B is relatively smaller trees. If all of the same internal node key stored in the same disk block, the disk block number of keywords can accommodate the more. Disposable read into memory in the keyword you want to find the more. IO read and write times will be relatively reduced.

B + - tree more stable query efficiency 

Since the non-endpoint is not an end point node contents of the file, but only the index leaf node in keywords. So look for any keywords must follow a path from the root to leaf node. All the same keyword query path length, resulting in a data query efficiency of each of pretty.

3.MySQL neither index difference of MyISAM and InnoDB

  • MyISAM non-transaction-safe, while InnoDB is a transaction-safe
  • MyISAM is table-level locking granularity, and InnoDB supports row-level locking
  • MyISAM supports full-text indexing type, while InnoDB does not support full-text indexing
  • MyISAM is relatively simple, the efficiency is better than InnoDB, MyISAM consider using small applications
  • Save MyISAM table into a file format, cross-platform easier to use
  • MyISAM manage non-transactional table, providing high-speed storage and retrieval, and full-text search capability, if a large number of select operations to perform in the application of alternative
  • InnoDB for transactions having a characteristic ACID transaction support, if a large number of insert and update operations performed in the application, be selected.

My micro-channel public number: architecture Scriptures (id: gentoo666), shared Java dry, high concurrency programming, popular technical tutorials, and distributed micro-services technology, architecture, design, block chain technology, artificial intelligence, big data, Java interview questions, as well as cutting-edge information and so popular. Updated daily Oh!

References:

  1. https://blog.csdn.net/bitboss/article/details/53219945
  2. https://blog.csdn.net/xiao_ma_CSDN/article/details/80773724
  3. https://blog.csdn.net/zhuyanlin09/article/details/94642626
  4. https://www.e-learn.cn/content/qita/809639

Guess you like

Origin www.cnblogs.com/anymk/p/11521516.html