Distributed want to learn, then you have to understand the theory of ACP

In July 2000, Professor Eric Brewer of the University of California, Berkeley, in ACM PODC conjecture proposed CAP meeting. Two years later, Seth Gilbert and Nancy Lynch of MIT theoretically demonstrated CAP. After, CAP theory became accepted theorems of distributed computing.

Whether you are a systems architect or a general development, or if you develop a distributed system design time, CAP theory is in any case not around the past. This article will introduce in the end what is the CAP theory, how to prove the theory CAP, the CAP and trade-offs.

CAP theory Overview

CAP theory: a distributed system can only meet the consistency (Consistency), availability (Availability) and partitions fault tolerance (Partition tolerance) in two of the three.

Readers should be noted that, CAP theory of CA and the CA database transaction ACID exactly the same thing and children. Both are among the A C is consistency (Consistency). CAP A refers to the availability (Availability), and ACID A refers to the atom of (Atomicity), not to be confused.

CAP definition

Consistency Consistency

Consistency means " all nodes see the same data at the same time" that the update operation was successful and returned after the client complete, consistent data from all nodes at the same time, so consistency, that is the data consistency. Distributed consistency

For consistency, it can be divided into client and server from two different perspectives. From the client perspective, consistency refers primarily to the question of how updated when multiple concurrent access to data acquisition. From the server point of view, it is how to copy update distributed throughout the system to ensure that the final data is consistent.

Consistency is because there are concurrent read and write only problem, so in understanding the issue of consistency, we must pay attention considered in conjunction concurrent read and write the scene.

From the perspective of the client, multi-process concurrent access to updated data in different strategies of how to get different processes, determines the different consistency.

Three kinds of policy consistency

  • For relational databases, updated data can be requested follow-up visits can see, this is a strong consistency.

  • If you can tolerate some or all of the follow-up visit less, it is a weak consistency.

  • If after a period of time required to access the updated data, it is the eventual consistency.

CAP中说,不可能同时满足的这个一致性指的是强一致性。

Availability 可用性

可用性指“Reads and writes always succeed”,即服务一直可用,而且是正常响应时间。

对于一个可用性的分布式系统,每一个非故障的节点必须对每一个请求作出响应。所以,一般我们在衡量一个系统的可用性的时候,都是通过停机时间来计算的。

可用性分类 可用水平(%) 年可容忍停机时间
容错可用性 99.9999 <1 min
极高可用性 99.999 <5 min
具有故障自动恢复能力的可用性 99.99 <53 min
高可用性 99.9 <8.8h
商品可用性 99 <43.8 min

通常我们描述一个系统的可用性时,我们说淘宝的系统可用性可以达到5个9,意思就是说他的可用水平是99.999%,即全年停机时间不超过 (1-0.99999)*365*24*60 = 5.256 min,这是一个极高的要求。

好的可用性主要是指系统能够很好的为用户服务,不出现用户操作失败或者访问超时等用户体验不好的情况。一个分布式系统,上下游设计很多系统如负载均衡、WEB服务器、应用代码、数据库服务器等,任何一个节点的不稳定都可以影响可用性。

Partition Tolerance分区容错性

分区容错性指“the system continues to operate despite arbitrary message loss or failure of part of the system”,即分布式系统在遇到某节点或网络分区故障的时候,仍然能够对外提供满足一致性和可用性的服务。

分区容错性和扩展性紧密相关。在分布式应用中,可能因为一些分布式的原因导致系统无法正常运转。好的分区容错性要求能够使应用虽然是一个分布式系统,而看上去却好像是在一个可以运转正常的整体。比如现在的分布式系统中有某一个或者几个机器宕掉了,其他剩下的机器还能够正常运转满足系统需求,或者是机器之间有网络异常,将分布式系统分隔未独立的几个部分,各个部分还能维持分布式系统的运作,这样就具有好的分区容错性。

简单点说,就是在网络中断,消息丢失的情况下,系统如果还能正常工作,就是有比较好的分区容错性。

CAP的证明

如上图,是我们证明CAP的基本场景,网络中有两个节点N1和N2,可以简单的理解N1和N2分别是两台计算机,他们之间网络可以连通,N1中有一个应用程序A,和一个数据库V,N2也有一个应用程序B2和一个数据库V。现在,A和B是分布式系统的两个部分,V是分布式系统的数据存储的两个子数据库。

在满足一致性的时候,N1和N2中的数据是一样的,V0=V0。在满足可用性的时候,用户不管是请求N1或者N2,都会得到立即响应。在满足分区容错性的情况下,N1和N2有任何一方宕机,或者网络不通的时候,都不会影响N1和N2彼此之间的正常运作。

如上图,是分布式系统正常运转的流程,用户向N1机器请求数据更新,程序A更新数据库Vo为V1,分布式系统将数据进行同步操作M,将V1同步的N2中V0,使得N2中的数据V0也更新为V1,N2中的数据再响应N2的请求。

这里,可以定义N1和N2的数据库V之间的数据是否一样为一致性;外部对N1和N2的请求响应为可用行;N1和N2之间的网络环境为分区容错性。这是正常运作的场景,也是理想的场景,然而现实是残酷的,当错误发生的时候,一致性和可用性还有分区容错性,是否能同时满足,还是说要进行取舍呢?

作为一个分布式系统,它和单机系统的最大区别,就在于网络,现在假设一种极端情况,N1和N2之间的网络断开了,我们要支持这种网络异常,相当于要满足分区容错性,能不能同时满足一致性和响应性呢?还是说要对他们进行取舍。

假设在N1和N2之间网络断开的时候,有用户向N1发送数据更新请求,那N1中的数据V0将被更新为V1,由于网络是断开的,所以分布式系统同步操作M,所以N2中的数据依旧是V0;这个时候,有用户向N2发送数据读取请求,由于数据还没有进行同步,应用程序没办法立即给用户返回最新的数据V1,怎么办呢?

有二种选择

  • 牺牲数据一致性,保证可用性。响应旧的数据V0给用户;

  • 牺牲可用性,保证数据一致性。阻塞等待,直到网络连接恢复,数据更新操作M完成之后,再给用户响应最新的数据V1。

这个过程,证明了要满足分区容错性的分布式系统,只能在一致性和可用性两者中,选择其中一个。

CAP权衡

通过CAP理论及前面的证明,我们知道无法同时满足一致性、可用性和分区容错性这三个特性,那要舍弃哪个呢?

我们分三种情况来阐述一下。

CA without P

这种情况在分布式系统中几乎是不存在的。首先在分布式环境下,网络分区是一个自然的事实。因为分区是必然的,所以如果舍弃P,意味着要舍弃分布式系统。那也就没有必要再讨论CAP理论了。这也是为什么在前面的CAP证明中,我们以系统满足P为前提论述了无法同时满足C和A。

比如我们熟知的关系型数据库,如My Sql和Oracle就是保证了可用性和数据一致性,但是他并不是个分布式系统。一旦关系型数据库要考虑主备同步、集群部署等就必须要把P也考虑进来。

其实,在CAP理论中。C,A,P三者并不是平等的,CAP之父在《Spanner,真时,CAP理论》一文中写到:

如果说Spanner真有什么特别之处,那就是谷歌的广域网。Google通过建立私有网络以及强大的网络工程能力来保证P,在多年运营改进的基础上,在生产环境中可以最大程度的减少分区发生,从而实现高可用性。

从Google的经验中可以得到的结论是,无法通过降低CA来提升P。要想提升系统的分区容错性,需要通过提升基础设施的稳定性来保障。

所以,对于一个分布式系统来说。P是一个基本要求,CAP三者中,只能在CA两者之间做权衡,并且要想尽办法提升P。

CP without A

如果一个分布式系统不要求强的可用性,即容许系统停机或者长时间无响应的话,就可以在CAP三者中保障CP而舍弃A。

一个保证了CP而一个舍弃了A的分布式系统,一旦发生网络故障或者消息丢失等情况,就要牺牲用户的体验,等待所有数据全部一致了之后再让用户访问系统。

设计成CP的系统其实也不少,其中最典型的就是很多分布式数据库,他们都是设计成CP的。在发生极端情况时,优先保证数据的强一致性,代价就是舍弃系统的可用性。如Redis、HBase等,还有分布式系统中常用的Zookeeper也是在CAP三者之中选择优先保证CP的。

无论是像Redis、HBase这种分布式存储系统,还是像Zookeeper这种分布式协调组件。数据的一致性是他们最最基本的要求。一个连数据一致性都保证不了的分布式存储要他有何用?

ZooKeeper是个CP(一致性+分区容错性)的,即任何时刻对ZooKeeper的访问请求能得到一致的数据结果,同时系统对网络分割具备容错性。但是它不能保证每次服务请求的可用性,也就是在极端环境下,ZooKeeper可能会丢弃一些请求,消费者程序需要重新请求才能获得结果。ZooKeeper是分布式协调服务,它的职责是保证数据在其管辖下的所有服务之间保持同步、一致。所以就不难理解为什么ZooKeeper被设计成CP而不是AP特性的了。

AP wihtout C

要高可用并允许分区,则需放弃一致性。一旦网络问题发生,节点之间可能会失去联系。为了保证高可用,需要在用户访问时可以马上得到返回,则每个节点只能用本地数据提供服务,而这样会导致全局数据的不一致性。

这种舍弃强一致性而保证系统的分区容错性和可用性的场景和案例非常多。前面我们介绍可用性的时候说到过,很多系统在可用性方面会做很多事情来保证系统的全年可用性可以达到N个9,所以,对于很多业务系统来说,比如淘宝的购物,12306的买票。都是在可用性和一致性之间舍弃了一致性而选择可用性。

You certainly encounter in 12306 when buying this kind of scenario, when you buy you are prompted to vote (but may actually have no ticket), you have to enter the normal code, the order. But after a while the system prompts you single failure, inadequate than votes. In fact, this is the first service to ensure the system can be normal in terms of usability and made some sacrifices in terms of consistency of data, will affect some users experience, but that does not mean serious obstruction caused by user processes.

However, we say that a lot of sites at the expense of consistency, the availability of choice, this is actually not accurate. For example, buy a ticket on the example above, in fact, abandon the only strong consistency. The next best thing to ensure that the eventual consistency. In other words, although the single moment on the ticket inventory data inconsistencies may exist, but over a period of time, or to guarantee eventual consistency.

For most large Internet application scenarios, many host, deploy dispersed, and now the cluster size increases, so a node failure, network failure is the norm, but also to ensure the availability of services to reach the N 9, which is to ensure P and A, give up C (second best guarantee eventual consistency). While some parts will affect the customer experience, but did not reach the severity caused by user processes.

The right is the best

The above describes how the CAP and weigh-offs and typical case. Which is better a little what, no conclusion can only be decided depending on the scene, the right is the best.

For the money this involves not a hint of compromise scenario, C must be guaranteed. Network failure would rather stop the service, which is guaranteed CA, abandon P. For example, a few years ago Alipay cable Waduan event, when the network failure, Alipay availability and data consistency between the selected data consistency, Alipay users feel the system down for a long time, but in fact behind countless engineers to recover data, to ensure the consistency of the number of data.

For other scenes, the more common practice is to select the partition availability and fault tolerance, give strong consistency, the next best thing eventual consistency to ensure data security. In fact, this is another theory --BASE theory of distributed field. We introduced again next article.

to sum up

Whether you are an architect, or a common development, in the design or development of distributed systems, the inevitable trade-offs to be made in the CAP. According to the actual situation of their own system, choose the most suitable solution.

Guess you like

Origin blog.csdn.net/Java__xiaoze/article/details/90639827