CAP定理之"再解"

对于CAP定理，wikipeida上是这样解释的：

Consistency (all nodes see the same data at the same time)
Availability (a guarantee that every request receives a response about whether it was successful or failed)
Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)

According to the theorem, a distributed system can satisfy any two of these guarantees at the same time, but not all three.

准确的来说，Availability在CAP的上下文中有特定的含义：如果你可以连上集群中的一个节点，它就能读写数据。Partition tolerance指的是在通信中断导致集群分裂成多个互不通信的情况下，集群仍能继续工作。

理论上来说是可以有CA集群的。然而，这意味着一旦集群中出现分裂，集群中所有节点将停机以至于没有一个节点可以被访问。依据通常的关于“可用性”的定义，这意味着缺乏可用性。但是CAP定义可用性为：系统中正常节点收到的每一个请求必须有一个响应（“every request received by a nonfailing node in the system must result in a response”）。在此定义下，不能以节点无应答推出缺乏可用性。

这意味着你确实可以构建一个CA集群，但是得确保集群分裂很少或从不发生。这一点至少在一个数据中心内是可能办到的。但是代价也非常高昂。为了在分裂时停机集群中的所有节点，你得及时的发现分裂，这一点本身就很难办到。

尽管CAP定理通常被理解为“三者只能选其二”，实际上是指在一个可能遭遇分裂的分布式系统中，你得在一致性和可用性之间做一个折衷。这不是说两者只能选其一。通常你可以牺牲一些一致性来换取一点可用性，反之亦然。你的系统可能不是百分百一致的，也不是百分百可用的，而是能够满足你的特定需求的合理组合。另外，满足CAP的系统也可以是这样的：

Because partitions are rare, CAP should allow perfect C and A most of the time, but when partitions are present or perceived, a strategy that detects partitions and explicitly accounts for them is in order. This strategy should have three steps: detect partitions, enter an explicit partition mode that can limit some operations, and initiate a recovery process to restore consistency and compensate for mistakes made during a partition.

通常来说将可用性换做延迟能更好的理解CA之间的折中。我们可以让更多的节点参与数据操作而改进一致性，但是越多的节点引入将增加响应时间。我们可以把可用性看作是我们可以接受的延迟极限。一旦延迟太大，我们就认为数据不可用了—正好契合CAP上下文中对可用性的定义。

猜你喜欢