Some thoughts on CAP theory and MongoDB consistency and availability

text

  

  About five or six years ago, I first came into contact with NoSql, which was already a hot topic at that time. But at that time, all I learned was mysql. Nosql was still a new thing for me, and I didn't really use it, but I just didn't know it. But what impressed me was such a picture (later google found that the picture came from here ):

  

    This picture is about the relationship between databases (including traditional relational databases and NOSQL) and CAP theory. Since NoSql has no practical experience and no in-depth understanding, it has a little understanding of CAP theory. Therefore, it is not clear why a certain database is divided into which camp.

    After work, I use MongoDB a lot, and I have a certain understanding. I saw this picture some time ago, so I want to find out whether MongoDB really belongs to the CP camp, and why? I suspect that the original intention of this question is because the classic (officially recommended) deployment architecture of MongoDB uses replica set, and replica set provides high availability (Availability) through redundancy and automatic failover, so why does MongoDB sacrifice Avalability in the above figure? ? And I searched for "CAP" in MongoDB's official documentation, and found nothing. So I wanted to figure out this question myself and give myself an answer.

  This article first clarifies what CAP theory is, and some articles about CAP theory, and then discusses the compromise and trade-off between consistency and availability of MongoDB.

CAP theory

For the CAP theory, I only know the meaning of these three words, and their explanations are also from some articles on the Internet, which may not be accurate. So first of all, we have to go back to the source to find out the origin and accurate explanation of this theory. I think the best place to start is wikipedia , from which you can see a more accurate introduction, and more importantly, you can see many useful links, such as the source of the CAP theory and its development and evolution.

  The CAP theory means that for distributed data storage, at most, only two of consistency (C, Consistency), availability (A, Availability), and partition tolerance (P, Partition Tolerance) can be satisfied at the same time.

  Consistency means that for each read operation, either the latest written data can be read, or an error occurs.

  Availability means that for each request, a timely and non-error response can be obtained, but there is no guarantee that the result of the request is based on the latest written data.

  Partition fault tolerance means that due to network problems between nodes, the entire system can continue to provide services (provide consistency or availability) even if some messages are packet-to-packet or delayed.

  Consistency and availability are both very broad terms, and they refer to different things in different semantic environments. For example, Brewer pointed out in the article cap-twelve-years-later-how-the-rules-have-changed "Consistency in CAP and consistency in ACID are not the same problem", so unless otherwise specified in the following text, the consistency and availability mentioned refer to the definition in CAP theory. Only when it is clear that everyone is in the same context can the discussion be meaningful.

    

  For distributed systems, the situation of network partition (network partition) is unavoidable, and there must be delays in data replication between nodes. If it is necessary to ensure consistency (the latest written data can be read for all read requests), then It is bound to be unavailable (unreadable) within a certain period of time, that is, availability is sacrificed, and vice versa.

  According to the description on Wikipedia, the relationship between CAPs originated in 1998. Brewer demonstrated the CAP conjecture on PODC ( Symposium on Principles of Distributed Computing ) in 2000 [3]. In 2002, two other Scientists Seth Gilbert and Nancy Lynch proved Brewer's conjecture, thus turning it from a conjecture into a theorem [4].

  

Origin of CAP theory

  In Towards Robust Distributed Systems  , Brewer, the creator of the CAP theory, pointed out that in a distributed system, computing is relatively easy, and the real difficulty is state maintenance. Then for distributed storage or data sharing systems, it is also difficult to guarantee data consistency. For traditional relational databases, the priority is consistency rather than availability, so the ACID feature of transactions is proposed. For many distributed storage systems, availability is more important than consistency. Consistency is guaranteed by BASE (Basically Available, Soft state, Eventual consistency). The following picture shows the difference between ACID and BASE:

  

  In short: BASE tries to ensure the availability of services through eventual consistency. Pay attention to the last sentence in the picture "But I think it'sa spectrum", which means that ACID BASE is only a matter of degree, not two opposite extremes.

  

  In 2002, in Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services , the two authors demonstrated the CAP conjecture through the asynchronous network model, thus upgrading Brewer's conjecture to a theory (theorem). But to be honest, I didn't read the article very well.

  

In this article brewers-cap-theorem  in 2009 , the author gave a relatively simple proof:

  

  As shown in the figure above, the two nodes N1 and N2 store the same data V, and the current state is V0. The safe and reliable write algorithm A runs on the node N1, and the same reliable read algorithm B runs on the node N2, that is, the N1 node is responsible for the write operation, and the N2 node is responsible for the read operation. The data written by the N1 node will also be automatically synchronized to N2, and the synchronized message is called M. If there is a partition between N1 and N2, then there is no guarantee that the message M will reach N2 within a certain period of time.

  Looking at these issues from a business point of view

  

   α This transaction consists of operations α1 and α2, where α1 is writing data and α2 is reading data. If it is a single point, it is easy to ensure that α2 can read the data written by α1. If it is a distributed situation, unless the occurrence time of α2 can be controlled, there is no guarantee that α2 can read the data written by α1, but any control (such as blocking, data centralization, etc.) will either destroy the partition fault tolerance, or loss of usability.

  In addition, the article here points out that availability is more important than consistency in many cases. For example, for websites like Facebook and Google, short-term unavailability will bring huge losses.

  

This article brewers-cap-theorem-on-distributed-systems/  in 2010 uses three examples to illustrate CAP, namely example1: a single mysql; example2: two mysqls, but different mysqls store different Data subset (similar to sharding); example3: two mysqls, an insert operation on A, needs to be executed successfully on B before the operation is considered complete (similar to a replica set). The author believes that strong consistency can be guaranteed on both example1 and example2, but availability cannot be guaranteed; in example3, due to the existence of partitions, a trade-off between consistency and availability is required.

  In my opinion, it is best to discuss the CAP theory under the premise of "distributed storage system". Availability does not mean the availability of the overall service, but the availability of a certain sub-node in the distributed system. So I feel that the above example is not very appropriate.

CAP Theory Development

    In 2012, Brewer, the inventor of the CAP theory, wrote another article on the CAP theory "CAP Twelve Years Later: How the "Rules" Have Changed" . There is also a Chinese translation of "CAP Theory Twelve Years Review: The "Rules" Have Changed" on the Internet , and the translation is not bad.

  In the article, the main point is that the CAP theory does not mean that the three do not need to choose the two. First of all, although as long as it is a distributed system, there may be partitions, but the probability of partitions is very small (otherwise you need to optimize the network or hardware), CAP allows perfect C and A most of the time; only when partitions exist During the period of time, it is necessary to balance between C and A. Secondly, both consistency and availability are a matter of degree, not 0 or 1. Availability can change continuously between 0% and 100%. Consistency can be divided into many levels (for example, in casandra, you can set the consistency level ) . Therefore, the goal of contemporary CAP practice should be to maximize the effectiveness of data consistency and availability within reasonable limits for specific applications.

  The article also pointed out that partition is a relative concept. When the predetermined communication time limit is exceeded, that is, if the system cannot achieve data consistency within the time limit, it means that a partition has occurred, and the current operation must be between C and A. choose between.

  In terms of revenue goals and contract regulations, system availability is the primary goal , so we routinely use caching or post-check update logs to optimize system availability. Therefore, when the designer chooses availability, it is because of the need to restore the broken invariance contract after the partition ends.

  In practice, most groups believe that there is no partition inside the data center (located in a single location), so CA can be selected in a single data center; before the emergence of the CAP theory, the system defaulted to this design idea, including traditional databases .

  During partitioning, independent and self-consistency subsets of nodes can continue to perform operations, but there is no guarantee that global scope invariance constraints will not be violated. Data sharding is an example of this. The designer divides data into different partition nodes in advance. During partitioning, a single data shard can probably continue to operate. Conversely, if the partitioned state is intrinsically closely related, or if there are some global invariant constraints that must be preserved, then at best only one side of the partition can be operated on, and at worst no operation can be performed at all.

  The lower line selection part in the above excerpt is very similar to MongoDB’s sharding situation. In MongoDB’s sharded cluste mode, under normal circumstances, shards do not need to communicate with each other.

  In the 13-year article " better-explaining-cap-theorem ", the author pointed out that "it is really just A vs C!", because

  (1) Availability is generally achieved through data replication between different machines

  (2) Consistency requires updating several nodes simultaneously between allowing read operations

  (3) temporary partion, that is, the communication delay between points may occur, and at this time, a trade-off between A and C is required. But the tradeoffs only need to be considered when partitioning occurs.

  In a distributed system, network partitions are bound to happen, so "it is really just A vs C!"

MongoDB and CAP        

In the article "Understanding MongoDB by creating a sharded cluster step by step", some characteristics of MongoDB are introduced, including high performance, high availability, and scalability (horizontal scaling). Among them, the high availability of MongoDB depends on the replication of replica set with automatic failover. There are three modes of using the MongoDB database: standalone, replica set, and shared cluster. The process of building a shared cluster was introduced in detail in the previous article.

  Standalone is a single mongod, and the application is directly connected to this mongod. In this case, there is no partition tolerance at all, and it must be strongly consistent. For a sharded cluster, each shard is also recommended to be a replica set. The shards in MongoDB maintain independent data subsets, so the partition between shards has little impact (it may still have an impact during the chunk migration process), so the main consideration is the partition impact of the replica set inside the shard. Therefore, this article discusses the consistency and availability of MongoDB, which is also aimed at the replica set of MongoDB.

  For the replica set, there is only one primary node, which accepts write requests and read requests, and other secondary nodes accept read requests. This is a single-write, multiple-read situation, which is much simpler than the multiple-read, multiple-write situation. For the purpose of discussion later, it is also assumed that the replica set consists of three base points, one primary, two secondary, and all nodes persist data (data-bearing)

  MongoDB's trade-off between consistency and availability depends on three factors: write-concern, read-concern, and read-preference. The following is mainly the situation of MongoDB3.2 version, because read-concern was introduced in MongoDB3.2 version.

write-concern:

  The write concern indicates under what circumstances MongoDB will respond to the client for the write operation. Contains the following three fields:

  { w: <value>, j: <boolean>, wtimeout: <number> }

  w: Indicates that the write request is returned to the client after being processed by value MongoDB instances. Ranges:

    1: The default value, indicating that the data is written to the MongoDB of the standalone or the primary of the replica set and returned

    0: Return directly to the client without writing, which has high performance, but data may be lost. However, it can be used with j: True to increase data durability (durability)

    >1: It is only useful in the replica set environment. If the value is greater than the number of nodes in the replica set, it may cause blocking

    'majority': When the data is written to the majority of nodes in the replica set, it will be returned to the client. In this case, it is generally used with read-concern:

    After the write operation returns with a w: "majority" acknowledgement to the client, the client can read the result of that write with a "majority" readConcern

  j: Indicates that the write request is returned to the client after it is written to the journal, and the default is False. Two points to note:

    If j:True is used for a MongoDB instance that does not enable journaling, an error will be reported

    In MongoDB3.2 and later, for w>1, all instances need to be written to the journal before returning

  wtimeout: Indicates the write timeout period, that is, at the specified time (number), if it cannot be returned to the client (w is greater than 1), then an error will be returned

    The default is 0, equivalent to not setting this option

  In MongoDB3.4, writeConcernMajorityJournalDefault was added . Such an option makes w and j different in different combinations:

  

read-reference:

As explained above  , a replica set consists of a primary and multiple secondary. The primary accepts write operations, so the data must be up-to-date, and the secondary synchronizes write operations through the oplog, so the data has a certain delay. For query services that are not very sensitive to timeliness, you can query from the secondary node to reduce the pressure on the cluster.

  

  MongoDB pointed out that it is very flexible to choose different read-references in different situations. MongoDB driver supports several read-references:

  primary: default mode, all read operations are routed to the primary node of the replica set

  primaryPreferred: Normally, it is routed to the primary node, and only when the primary node is unavailable (failover), it is routed to the secondary node.

  Secondary: All read operations are routed to the secondary node of the replica set

  secondaryPreferred: Normally, it is routed to the secondary node, and only when the secondary node is unavailable, it is routed to the primary node.

  nearest: Read data from the node with the smallest delay, whether it is primary or secondary. For distributed applications and MongoDB is deployed in multiple data centers, nearest can guarantee the best data locality.

  If you use secondary or secondaryPreferred, you need to be aware of:

  (1) Due to the delay, the read data may not be the latest, and the data returned by different secondary may be different;

  (2) For a sharded collection with a balancer enabled by default, due to chunk migration that has not yet ended or terminated abnormally, the secondary may return missing or redundant data

  (3) When there are multiple secondary nodes, which secondary node to choose? Simply put, it is "closest", that is, the node with the smallest average delay. Specifically, participate in the Server Selection Algorithm 

read-concern:

  Read concern is a new feature added in MongoDB3.2, which indicates what kind of data is returned for the replica set (including the shard using the replica set in the sharded cluster). Different storage engines have different support for read-concern

  read concern has the following three levels:

  local: default value, returns the latest data of the current node, the current node depends on the read reference.

  majority: Returns the latest data that has been confirmed to be written to the majority of nodes. The use of this option requires the following conditions: WiredTiger storage engine, and use election  protocol version 1 ; specify when starting the MongoDB instance --enableMajorityReadConcern选项。

  linearizable: Introduced in version 3.4, skipped here, interested readers refer to the documentation.

  There is such a sentence in the article:

Regardless of the read concern level, the most recent data on a node may not reflect the most recent version of the data in the system.

  That is to say, even if read concern:majority is used, the returned data may not necessarily be the latest data, which is not the same as the NWR theory. The root cause is that the value returned in the end only comes from one MongoDB node , and the selection of this node depends on the read reference.

  In this article , the significance and implementation of the introduction of readconcern are introduced in detail, and only the core part is quoted here:

readConcern The original intention is to solve the "dirty read" problem. For example, a user reads a certain piece of data from the primary of MongoDB, but this data is not synchronized to most nodes, and then the primary fails. After recovery, the primary node will Roll back the data that is not synchronized to most nodes, causing users to read "dirty data".

When the readConcern level is specified as majority, it can ensure that the data read by the user "has been written to most nodes", and such data will definitely not be rolled back, avoiding the problem of dirty reads.

 Consistency or availability?

  Review the problem of consistency availability in CAP theory:
  consistency means that for each read operation, either the latest written data can be read, or an error occurs.
  Availability means that for each request, a timely and non-error response can be obtained, but there is no guarantee that the result of the request is based on the latest written data.

  As mentioned earlier, the discussion of consistent availability in this article is based on the replica set, and whether it is a shared cluster does not affect it. In addition, the discussion is based on the case of a single client. If there are multiple clients, it seems to be a problem of isolation, which does not belong to the scope of CAP theory. Based on the understanding of write concern, read concern, and read reference, we can draw the following conclusions.

  • By default (w: 1, readconcern: local), if the read preference is primary, then the latest data can be read, with strong consistency; but if the primary fails at this time, an error will be returned at this time, and availability cannot be guaranteed
  • By default (w: 1, readconcern: local), if the read preference is secondary (secondaryPreferred, primaryPreferred), although outdated data may be read, the data can be obtained immediately, and the usability is better
  • writeconcern: majority guarantees that the written data will not be rolled back; readconcern: majority guarantees that the data read will not be rolled back
  • If (w: 1, readconcern; majority) even if it is read from the primary, there is no guarantee that the latest data will be returned, so it is weak consistency
  • If (w: majority, readcocern:majority), if it is read from the primary, then the latest data must be read, and this data will not be rolled back, but the write availability is poor at this time; if it is read from the secondary Read, can not guarantee to read the latest data, weak consistency.


  Looking back, the high availability mentioned by MongoDB is availability in a more general sense: through data replication and automatic failover, even if a physical failure occurs, the entire cluster can still recover in a short time and continue to work, not to mention that the recovery is automatic of. In this sense, it is indeed highly available.

Guess you like

Origin blog.csdn.net/weixin_45925028/article/details/132234801