LSM tree (log structure merge tree)_Notes

WAL: Write Ahead Log write log, sequential log file

1 Definition of LSM tree

LSM tree:
Log-Structured-Merge-Tree, log structure merge tree.

Log-Structured Merge-tree (LSM-tree) is a disk-based data structure designed to provide low-cost indexing for a file experiencing a high rate of record inserts (and deletes) over an extended period.

It can be considered that the LSM tree is a collection of memory trees and disk file trees .
For example, the memtable in the memory of LSM can be implemented using B+Tree, and the sst files on each layer on the disk can also be implemented using B+Tree.

2 Composition of LSM tree

The LSM tree consists of three parts:

  1. MemTable,
  2. Immutable MemTable,
  3. SSTable(Sorted String Table)
    insert image description here

2.1 MemTables

MemTable is a data structure in memory, which is used to save the latest updated data, and organize these data in an orderly manner according to Key.
It can be a red-black tree, a jump table, etc.

2.2 Immutable MemTable

The cache added when memory data is persisted to disk.

2.3 SSTable

An ordered collection of key-value pairs. The basic idea of ​​the file structure is to divide it into data blocks first, and then create an index for the data blocks, and the index items are placed at the end of the file.

When the number of SSTables in a certain layer reaches the threshold, it will be converted to the next layer through the Compact strategy.
There are two basic strategies:

  1. size-tiered strategy
  2. level strategy

3 Operation of LSM tree

Using sequential writes to improve write performance comes at the cost of slightly lower read performance (read amplification), increased write volume (write amplification) and increased occupied space (space enlargement).

3.1 Insertion, modification and deletion of LSM tree

The insertion, modification, and deletion of the LSM tree are all operated in the L0 tree, and the timestamp of the record item is recorded.

3.2 LSM tree search

To search, you only need to search down from the L0 layer until you find the record of a certain key.

4 Comparison of LSM tree and B+tree

  1. LSM tree has high throughput and high write performance;
    LSM tree does not require disk IO, directly operates memory, and takes a short time for a single insertion, so its maximum write throughput is higher than that of B+ tree.
  2. The search efficiency of LSM tree is low;
    when searching, LSM tree needs to traverse all levels of trees, and the search efficiency is lower than that of B+ tree.
  3. Concurrency control and failure recovery:
    insert image description here

5 Application of LSM tree

LSM trees are mainly used in NoSql databases, such as HBase, RocksDB, LevelDB, etc.
TiDB's underlying storage also uses an LSM tree.

6 Summary

LSM tree features: sequential write, compact operation, read, write, and space enlargement.
Applicable scenarios of LSM tree: For scenarios that require high throughput of write operations and high throughput of read operations, it is mainly used in NoSql databases at present.

Guess you like

Origin blog.csdn.net/afei8080/article/details/129332661