Detailed explanation of the principles of KaiwuDB distributed system Range Split & Merge

Background introduction

Range (partitioning) is a technology for database management and organization of data. In a distributed system, we can divide data into multiple Ranges for organization according to certain rules, and achieve dynamic management through splitting and merging to optimize query performance and help improve the scalability and availability of the system. and load balancing. The main content of this live broadcast is: Split and Merge of Range in the KaiwuDB distributed system.

The following is an excerpt of part of the content. For the complete content, click to view the full version >> Full version video playback

KaiwuDB Range Split

Introduction to the split

splitQueue is responsible for Range splitting. The conditions that trigger Range splitting are as follows:

  • Create a new database or table.
  • Range size exceeds range_max_bytes.
  • The QPS of Range is too high, exceeding kv.range_split.load_qps_threshold (default value 250, configurable).
  • Modify the Configure zone of the index or partition to make it independent from the parent level. In special cases, adminSplit splitting will be called directly without going through splitQueue.
  • When importing a large amount of data, one Range is automatically split into multiple Ranges.
  • When importing data, a blank Range is pre-split for data that may be imported later.
  • Manual split: alter table table_name split at values ​​(key1,key2,…); where Values ​​represents the primary key value. If it is a joint primary key, you can write multiple values, and it cannot exceed the number of primary key columns.

Split algorithm flow chart

A certain node in KaiwuDB has a separate thread/worker running in the background to handle the splitting of related Ranges. Range splitting is divided into 2 phases: Phase 1 - Range split parameter preparation; Phase 2 - Update Range and its index structure.

As shown in the figure, the process on the left is mainly the preparation for splitting Range:

First lock the key value of the Key of the split Range. After finding this key value, the system will adjust the current Range range and use the split Key key value as the end key value of the partition.

The process will create a new Range for the right, its starting key value is the key value used for splitting, and its ending key value is the ending key value of the original Range. At the same time, after the original Range is split, its version will be iteratively updated by 1, and the updated version will be applied to the left and right split Ranges at the same time.

When the split parameters on the left and right sides of the Range are ready, the process enters the system data update phase. In summary, it is necessary to prepare write requests and process requests. This process is associated with a transaction, and the entire transaction needs to complete the following matters:

Start a new transaction, the status is pending

  • Update left Range
  • Write new right Range
  • Update the corresponding search path of the two-level tree structure of the Big Mac index on the left Range
  • Insert the corresponding search path of the two-level tree structure of the Big Mac index in the right Range
  • The status of the update transaction is Committed
  • Update the MVCC of left and right Range
  • Clean up writing intention

At this point, the splitting of the entire Range is completed.

Split trigger example

The following debugging trigger scenario occurs after the Range size exceeds a predetermined critical value. Each Range is processed in a timer waiting serial manner. After the clock wakes up, the Range size will be checked. If it is found that the Range size is around 70MB and exceeds the predetermined range of 64M, the system will trigger the split Range function to split the Range that exceeds the capacity limit.

The background thread or worker will continuously check all Ranges for processing in a loop.

KaiwuDB Range Merge (Merge)

Merge example

As shown in the figure, when the user deletes a large amount of data, the size of the two adjacent Ranges decreases sharply, and the system merges them.

The following figure shows the effect after merging. After the previous two Ranges: lem-str, str- were merged, str- disappeared, leaving only lem-. The Key values ​​of the original two Ranges were merged into the same Range lem. -. Correspondingly, the second-level data structure of the Big Mac index has also been adjusted, and the pea- in the first-level tree index structure has disappeared.

Merge conditions

The conditions for merging Ranges are relatively strict, mainly including: Merging is not disabled There is a next Range and the same Configure zone The size of the two Ranges to be merged is less than range_min_bytes Range split of QPS will not be triggered after merging

The last one means that if the Range where the data is located has hot data, the merge will not be performed, because the merge will trigger a split, which will cause the system to fail to operate normally.

Merge algorithm flow chart

As shown in the figure, merging, like splitting, is also divided into two stages: the first stage - Range parameter preparation; the second stage - enabling transactions for Range update processing. For the complete content, please click to view the full version content >> [Full version video playback] ( https://www.bilibili.com/video/BV11y421z7jH/?spm_id_from=333.999.0.0 )

Merge debugging example The following two schematic diagrams show the specific process of debugging Range merging and the corresponding Range related parameters.

I decided to give up on open source industrial software. Major events - OGG 1.0 was released, Huawei contributed all source code. Ubuntu 24.04 LTS was officially released. Google Python Foundation team was laid off. Google Reader was killed by the "code shit mountain". Fedora Linux 40 was officially released. A well-known game company released New regulations: Employees’ wedding gifts must not exceed 100,000 yuan. China Unicom releases the world’s first Llama3 8B Chinese version of the open source model. Pinduoduo is sentenced to compensate 5 million yuan for unfair competition. Domestic cloud input method - only Huawei has no cloud data upload security issues
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5148943/blog/11046913