KV data fragmentation and distribution

KV data fragmentation and distribution

KV storage data organization method

  1. Hash: For the Hash method, the storage location of key-value pair data is determined by a predefined Hash function, so all key-value pair data is not in an orderly arrangement . The advantage of the Hash method is that the efficiency of calculating the storage location through the Hash function is high, so the processing speed of insertion, deletion, update, and single-point query operations is relatively fast , but it is mainly determined that range queries cannot be processed due to out-of-order storage .
  2. Orderly Arrangement: It can support all key-value pair data access interfaces, including range query, and generally adopts a tree structure to organize data (such as B-tree), so it is faster than Hash when inserting, deleting, updating, and single-point query operations Way slightly lower.

KV sharding

For a KV system, there are two typical solutions for distributing data across multiple machines:

  • Hash: Hash is done according to the Key, and the corresponding storage node is selected according to the Hash value.
  • Range: Range is divided according to Key, and a certain continuous Key is stored on one storage node

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-azeZqlMB-1672623809846)(https://s3-us-west-2.amazonaws.com/secure.notion - static.com/16a9487e-94ec-4b68-a12c-4a06b3d03ede/Untitled.png)]

When a distributed system uses sharding for load balancing, the related issue is which node in the distributed system the shard will be placed on, that is, the data distribution problem; and how to know where the target shard is when the user accesses the data. in the node.

Most systems first perform load balancing according to the storage capacity of each node . When the storage capacity is similar, the strategy of random placement is generally adopted . Once there is a serious load imbalance in the system, a certain degree of data migration can be performed when the system load is relatively low to improve the load balance.

Metadata: the distribution information of shards on nodes, and the key range that each shard is responsible for in range sharding mode .

How metadata is maintained:

One is maintained by a dedicated metadata server. When users access, they need to contact the metadata server first, and then locate the specific data server for access; and in order to reduce the network communication caused by metadata access, the client generally has metadata cache. A typical system is TiDB, which contains an important subsystem, Placement Driver (PD), which is dedicated to maintaining metadata, and PD itself is also a distributed system to ensure system reliability.

The second is that each node stores metadata, which is convenient for positioning during access, and the metadata is constantly updated synchronously between each node. For example, each data node in CockroachDB maintains the metadata of all fragments, and the metadata is synchronized between nodes through the Gossip protocol.

TiKV

Reference from https://book.tidb.io/session1/chapter2/tidb-storage.html

TiKV chooses the second method, dividing the entire Key-Value space into many segments, each segment is a series of continuous Keys, each segment is called a Region, and will try to keep the data stored in each Region within a certain size , currently the default is 96MB in TiKV. Each Region can be described by a left-closed right-open interval such as [StartKey, EndKey).

After dividing the data into Regions, TiKV will do two important things:

  • Raft replication and member management in units of Regions
  • In the unit of Region, disperse the data on all nodes in the cluster, and try to ensure that the number of Regions served on each node is about the same

The first point is that the data is divided into many regions according to the key, and the data of each region will only be saved on one node (multiple copies are not considered for now). The TiDB system has a component (PD) responsible for distributing Regions as evenly as possible on all nodes in the cluster, thus achieving horizontal expansion of storage capacity and load balancing. In order to ensure that the upper-layer client can access the required data, there will also be a component (PD) in the system to record the distribution of the Region on the node, that is, through any Key, it can be queried which Region the Key is in, and this Which node the Region is currently on (that is, the location routing information of the Key).

The second point is that TiKV replicates data in units of Regions, that is, multiple copies of a Region's data are stored, and TiKV calls each copy a Replica. Data consistency is maintained between Replicas through Raft. Multiple Replicas of a Region will be stored on different nodes to form a Raft Group. One of the Replicas will be the Leader of the group, and the other Replicas will be Followers. All reads and writes are performed through the Leader, and the read operation can be completed on the Leader, while the write operation is copied to the Follower by the Leader.
insert image description here

Guess you like

Origin blog.csdn.net/qq_47865838/article/details/128518266