Talk about CAP theory again: eliminate some stubborn misunderstandings in the past

introduction

After reading two very good translations of this article, I found that there were some deviations in the previous understanding of CAP, which led to problems with the understanding of BASE. After reading the two articles with an humility, some misunderstandings about the previous were also eliminated. Of course, it is not clear whether it is really understood. After all, I have not read the papers of CAP and BASE, and lack the training of actual industrial projects. This article aims to discuss a few points that I did not realize about CAP before. These have caused a lot of doubts during the intermediate learning. Now it seems that the basic theoretical understanding is not solid enough, so if you can know these For details, there will be fewer detours when studying. This is also the original intention of writing this article.

These two articles are "CAP Twelve Years Later: How the "Rules" Have Changed" and "BASE: An Acid Alternative". The author of the former has his own unique writing style, that is, the focus is often generalized. Many articles There is not much explanation for the golden sentence, but it is quite touching; the latter broke my usual understanding of BASE, that is, BASE is not only an extension of CAP, but also a compromise with ACID. This is what I used to be. Unexpected, but also a very important part.

Here is a simple summary of these thought-provoking words:

《CAP Twelve Years Later: How the “Rules” Have Changed》

  • In fact, only the rare situation of " perfect data consistency and availability when the partition exists " is not allowed by the CAP theory.

The hidden meaning of this sentence is that the CAP theory is not a simple "three choose two". The formula we think of "three choose two" has always been misleading. It will oversimplify the relationship between the various properties.

Because our system generally rarely occurs in partitions, there is no reason to sacrifice one of the CAs when there is no partition in most of the time. That is to say, the general system claims that it meets the CP or AP actually says that it occurs in the partition. How to choose when, and it does not mean that one of the attributes is completely discarded, but only to a certain extent. For example, when we need to pay attention to the request delay of the program, it is obvious that strong consistency will lead to a significant drop in the efficiency of the program. At this time, we can choose to reduce the requirements for consistency to improve the efficiency of the program. At this time, we are not for availability. , But a trade-off made to meet demand. Moreover, consistency can be subdivided into many levels, availability can also vary continuously from 0% to 100%, and even different parts of the system can have different perceptions of whether there are partitions.

  • The granularity that CAP focuses on is data , not the entire system.

The data of a distributed system is diverse, and different data should have different priorities. For example, the priority of bank deposit data is obviously higher than today's ranking data, so for a virtual bank management system, It is impossible for us to define AP or CP in this system, but to define some of the data separately. So why do so many open source components claim to be AP or CP? Because it is talking about the data stored in it, all we have to do is to put the data that needs its declaration into this component.

  • The classic interpretation of CAP theory ignores network delays .

Brewer did not take delay into consideration when defining consistency. In other words, when the transaction is committed, the data can be instantly replicated to all nodes. This is actually a surprising conclusion, because it reveals that there is no absolute consistency in the server perspective in distributed, everything is final consistency, nothing more than the length of time is different (of course, the server perspective discussed here Consistency rather than consistency from the user's perspective).

This also tells us that in reality, partitioning is equivalent to the time limit of communication , because we can't accurately judge the partition, we can only judge whether there is a partition by timeout, since the operation timeout judges that there is a partition, we assume that there are two kinds of partitions. Operation, one is a heartbeat packet; the other is a data packet to reach agreement. The former judges that there is no additional action required after the partition occurs, and the latter requires us to make a choice between CAs. This is actually the core of the problem. It is to continue to provide services and select availability after partitioning without communication. Merge conflicting requests after recovery. Or choose a partition to provide services or not to provide service selection consistency . When there is no partitioning, choose CA based on actual needs. In a cross-regional system, it is very meaningful to abandon strong consistency to avoid the high latency caused by maintaining data consistency. At this time, we can choose BASE because it is eventually consistent. Several variants of sex [6] are completely sufficient in many situations.

The article also cited the examples of Yahoo and Facebook. In fact, their essence is ultimate consistency, but at least from the short description in the article, it is not strictly self-reading and writing consistency, because they all delay user behavior or operations. I have made assumptions. This can also tell the difference between engineering and academics. Academic research will tell you not to enter the casino, because in the end you will lose money, and engineering practice will tell you that if you are willing to take a certain risk, you may be able to make a small profit. One stroke!

  • " The scope of consistency " actually reflects the notion that the state is consistent within a certain boundary, but beyond the boundary, there is no way to talk about it.

I am also confused about this passage. The example given in the article is that complete consistency and availability can be guaranteed in the primary partition, but services outside the partition are unavailable . This can actually be compared to a quorum. When there is a partition, only the party with the leader can continue to provide services.

The article also gives an example, which is data sharding. When certain shards are partitioned, each shard can provide services separately; if the internal relations of the partitions are relatively close or some global immutability constraints It must be kept, then only one partition provides service or the cluster stops service at this time.

Now returning to this sentence, my understanding is that the consistency range specified during partitioning occurs, which is a kind of eventual consistency . If global strong consistency is selected during partitioning, then all shards will stop serving. A lower level of consistency is allowed. At this time, some nodes can continue to serve and perform data integration after the partition ends.

There is a very interesting idea here. For the discussion of offline mode, the offline mode is regarded as a long-term partition, then A is selected at this time, which is very useful for improving user satisfaction, but it is also bound to It will cause data inconsistency, so stop the partition at the moment when you connect to the Internet and synchronize the data.

  • Sacrificing does not mean doing nothing. You need to prepare for the partition recovery.

This actually implies that we do not deny service, and some systems do choose to deny service, such as Redis Cluster.

In fact, this sentence is easy to understand. After all, we want this system to provide liveness, that is, the service cannot be rejected forever. Take the example in the text, for the immutability constraints that must be maintained during the partitioning period, the designer should prohibit or modify that may violate the Operation of immutability constraints . That is, some operations cannot be executed during the partition, which implies sacrificing availability, so we need to record these operations, and automatically execute and restore data after the partition ends without destroying global consistency.

If consistency is sacrificed, we will make every partition available during partitioning, which guarantees availability, but may violate some global immutability constraints. At this time, it is necessary to restore these constraints after the partition is restored. The article gives There is a way that it will roll back the database to the right moment and re-execute all operations in an unambiguous and deterministic order, and finally make all nodes reach the same state . This is actually similar to the Raft algorithm for handling conflicts. Of course, the leader's log in Raft is authoritative, and the slave will roll back the log until it is the same as the master node log, and roll back all subsequent logs.

《BASE: An Acid Alternative》

I have been comparing BASE and CAP. In fact, depending on the time when they were proposed, the concept of BASE was first proposed in 1990, and CAP was first proposed in 1998. It can be seen that BASE was not originally intended to expand the use of CAP. , But as a supplement to ACID. There are two sentences in the previous article that confirm this view:

  1. The first round of controversy on "data consistency VS availability", manifested as the dispute between ACID and BASE
  2. ACID and BASE represent two diametrically opposed design philosophies, with consistency-the two poles of the availability distribution map .

Obviously we do not have these doubts on the stand-alone database, because the stand-alone transaction is already very mature. When it comes to distributed, everything is different. If we want to continue to support ACID, the price we have to pay is very high. The general 2PC is obviously very problematic. As a consensus algorithm, 2PC is used to ensure that all processes are in " The transaction is either committed or failed to exit". 2PC is safe, no bad data will be written to the database, but its activity is not good: if the transaction manager fails at a wrong point, then the system will block. This is one of the problems. 2PC also has the problems of synchronization blocking and data inconsistency.

In other words, the 2PC used to implement ACID leads to reduced availability:

For example, if we assume each database has 99.9 percent availability, then the availability of the transaction becomes 99.8 percent, or an additional downtime of 43 minutes per month.
For example, we assume the availability of each database 9 is three, and in the 2PC In the implemented transaction, the availability is reduced to 99.8%, so the time for a problem per month is 43 minutes.

For usability, we can choose to achieve a more relaxed consistency, which is what we call final consistency, which can already meet our needs in many cases. The so-called Basically Available (Basically Available) means that some errors are supported, but the whole is still working . The examples given in the article are as follows:

if users are partitioned across five database servers, BASE design encourages crafting operations in such a way that a user database failure impacts only the 20 percent of the users on that particular host. There is no magic involved, but this does lead to higher perceived availability of the system.
If users are divided into 5 databases, the design of BASE encourages separate operations, so that even if something goes wrong, only 20% of the users will be affected. There is no magic way to do this, but this can indeed achieve high availability. system.

Let’s take a realistic example. If a transaction includes three steps: shopping cart modification, inventory modification, and payment, we don’t want the collapse of the shopping cart section, which is not so important, to affect the execution of other core sections. At this time, we can choose to execute it first. For the last two, the shopping cart continues to execute after it is restored, and the final consistency brings about the high availability of the entire system . Of course, how to do it is also a problem. You can refer to the article on this point. The article proposes to introduce a message queue to solve the whole problem.

There are still many friends who have a very vague understanding of soft state. I try to explain this term with an example in the article (you may need to read the original text to know how to use message queues): In a banking system, user A transfers a sum of money To user B, if we use BASE to do this, it means that A transfer and B collection are two steps, and the intermediate B operation is stored in the message queue, then there may be a time window for the money to leave A, B also did not receive it, it was persisted to the message queue, which improved the availability of the system, the state at this time is the soft state. Obviously, if this window delay is very short, the user can tolerate it, or even not find it at all, but this greatly improves the usability of the system.

In addition, BASE and CAP are also worth mentioning, because the previous misunderstanding of BASE is quite deep, I think it’s not just me, because I have seen many posts that write "BASE is gradually evolved based on the CAP theorem". According to the above description, it is obvious that this is a problematic statement. Of course, this statement is also reasonable, because the BASE theory was formally proposed in 2008. Of course, this is not a problematic conclusion, because we mentioned earlier that CAP can choose CA when there is no partition, and often because the efficiency is not that way, this is where BASE is used, and the precise description is even if it cannot be done. Strong consistency (Strong consistency), but each application can adopt an appropriate method to make the system achieve eventual consistency according to its own business characteristics .

Of course, for these three theories, I think the most objective description at present is: " As commercial software, the decline in usability is intolerable. Therefore, the author introduced the CAP theory, that is, we cannot have both usability and consistency, but most of us There is no need to sacrifice one party completely, and then propose BASE to achieve a balance between the two according to specific needs, that is, by relaxing the strict consistency of ACID, the system availability and scalability can be improved ."

There may be some deviations between the understanding and the actual situation in the article, please discuss and give pointers!

reference:

  1. Blog post " Twelve Years of CAP Theory: "The Rules" Changed "
  2. 博文《CAP Twelve Years Later: How the “Rules” Have Changed
  3. Blog post " CAP Details "
  4. Hirofumi 《BASE: An Acid Alternative
  5. Blog post " Talk about a little understanding of distributed consistency "
  6. 博文《A brief history of Consensus, 2PC and Transaction
  7. Blog post " CAP Principle (CAP Theorem), BASE Theory "
  8. Blog " Talk about ACID, CAP and BASE "

Guess you like

Origin blog.csdn.net/weixin_43705457/article/details/108253963