paxos made simple 译文

1 Introduction

The Paxos algorithm for implementing a fault-tolerant distributed system has been regarded as difficult to understand, perhaps because the original presentation was Greek to many readers. In fact, it is among the simplest and most obvious of distributed algorithms. At its heart is a consensus algorithm—the “synod” algorithm of. The next section shows that this consensus algorithm follows almost unavoidably from the properties we want it to satisfy. The last section explains the complete Paxos algorithm, which is obtained by the straightforward application of consensus to the state machine approach for building a distributed system—an approach that should be well-known, since it is the subject of what is probably the most often-cited article on the theory of distributed systems.

用于实现容错分布式系统的Paxos算法一直被认为很难理解，可能之前的描述对大多数读者来说太过Greek了。实际上，它是最简单明了的分布式算法之一。其核心是一个共识算法——“The synod algorithm of”。在下一节中我们将展示，该共识算法几乎满足了所有我们想要他满足的特性。最后一节解释了完整的Paxos算法，该算法通过直接应用协商一致的状态机来构建分布式系统，这种方法应该是众所周知的，因为它可能是分布式系统理论中最常被引用的文章。

2 The Consensus Algorithm

2.1 The Problem

Assume a collection of processes that can propose values. A consensus algorithm ensures that a single one among the proposed values is chosen. If no value is proposed, then no value should be chosen. If a value has been chosen, then processes should be able to learn the chosen value. The safety requirements for consensus are:

假设有一组进程可以提出values。一致性算法保证在所有提出的values里，只有一个value会被选中。如果没有提出任何value，那么也就没有value会被选中。如果一个value被选中，进程能够学习到被选择的value。一致性的安全要求是：

• Only a value that has been proposed may be chosen,
• Only a single value is chosen, and
• A process never learns that a value has been chosen unless it actually has been.

• 只能选择已提出的value，

• 只能有一个value被选择，

• 进程只能学习到已经被选择的value。

We won’t try to specify precise liveness requirements. However, the goal is to ensure that some proposed value is eventually chosen and, if a value has been chosen, then a process can eventually learn the value.

我们不会尝试指定精确的要求。但是，目标是确保总有一些value最终被选择，如果一个value已经被选择，那么进程最终能学习到这个value。

We let the three roles in the consensus algorithm be performed by three classes of agents: proposers, acceptors, and learners. In an implementation,a single process may act as more than one agent, but the mapping from agents to processes does not concern us here.

我们用三类agent来代表一致性算法中的三个角色：proposers、acceptors和learners。在实际实现中，一个进程可能同时扮演多个角色，但是我们这里不关心从agent到进程的对应关系。

Assume that agents can communicate with one another by sending messages. We use the customary asynchronous, non-Byzantine model, in which:

假设agent之间可以通过发送消息来互相通信。我们使用传统的异步、非拜占庭模型，其中：

• Agents operate at arbitrary speed, may fail by stopping, and may restart. Since all agents may fail after a value is chosen and then restart, a solution is impossible unless some information can be remembered by an agent that has failed and restarted.

• agent可以以任意速度运行，可能因为失败而停止，可能重启。所有的agents可能在选择一个value后重启，因此除非失败和重启的agent能记录一些信息，否则解决方案是不可行的。
• Messages can take arbitrarily long to be delivered, can be duplicated,and can be lost, but they are not corrupted.

• 发送的消息可以是任意长度，可以重复，可以丢失，但是不会被篡改。

2.2 Choosing a Value

The easiest way to choose a value is to have a single acceptor agent. A proposer sends a proposal to the acceptor, who chooses the first proposed value that it receives. Although simple, this solution is unsatisfactory because the failure of the acceptor makes any further progress impossible.

选择value的最简单方法是只有一个acceptor agent。proposer将提议发给该acceptor，acceptor选择它收到的第一个value。虽然简单，但这个方案并不令人满意，因为acceptor异常之后会导致后续的操作都不成功。

So, let’s try another way of choosing a value. Instead of a single acceptor,let’s use multiple acceptor agents. A proposer sends a proposed value to a set of acceptors. An acceptor may accept the proposed value. The value is chosen when a large enough set of acceptors have accepted it. How large is large enough? To ensure that only a single value is chosen, we can let a large enough set consist of any majority of the agents. Because any two majorities have at least one acceptor in common, this works if an acceptor can accept at most one value. (There is an obvious generalization of a majority that has been observed in numerous papers, apparently starting with [3].)

所以，让我们尝试另一种选择value的方法。我们用多个acceptor agents，而不是一个acceptor。proposer将提议发给一组acceptors。Acceptor可以接受提议的value。当足够多的acceptors接受该value后，该value才会被选择。那么多少算足够多呢？为了保证只有一个value被选择，我们可以认为足够多的集合由agents中的多数派组成。因为任意两个多数派都至少有一个公共的acceptor，因此如果一个acceptor至多接收一个value，这种方法就是可行的。（在许多论文中都能看到一个很明显的结论，最开始出现在论文The implementation of reliable distributed multiprocess systems中）

In the absence of failure or message loss, we want a value to be chosen even if only one value is proposed by a single proposer.This suggests the requirement:

在没有失败或者消息丢失的情况下，即使只有一个proposer提出了一个value，我们也希望该value能被选择。这就需要满足如下的要求：

P1. An acceptor must accept the first proposal that it receives.

P1.Acceptor必须接受它收到的第一个提议。

But this requirement raises a problem. Several values could be proposed by different proposers at about the same time, leading to a situation in which every acceptor has accepted a value, but no single value is accepted by a majority of them. Even with just two proposed values, if each is accepted by about half the acceptors, failure of a single acceptor could make it impossible to learn which of the values was chosen.

但是这个要求会引起一个问题。不同的proposer可以同时提出多个values，从而导致每一个acceptor都接受了一个value，但是没有任何一个value是被多数派接受的。即使只有两个提议的values，如果每一个被一半的acceptors接受，任何一个acceptor故障都可能使我们无法知道哪个value被选择了。

P1 and the requirement that a value is chosen only when it is accepted by a majority of acceptors imply that an acceptor must be allowed to accept more than one proposal. We keep track of the different proposals that an acceptor may accept by assigning a (natural) number to each proposal, so a proposal consists of a proposal number and a value. To prevent confusion,we require that different proposals have different numbers. How this is achieved depends on the implementation, so for now we just assume it. A value is chosen when a single proposal with that value has been accepted by a majority of the acceptors. In that case, we say that the proposal (as well as its value) has been chosen.

P1要求一个value只有被多数派个acceptors接受才算被选中，意味着必须允许一个acceptor接受多个提议。我们通过给每个提议一个自然编号来跟踪不同的提议，所以一个提议由提议编号和value组成。为了避免冲突，我们要求不同的提议有不同的编号。这里我们仅仅只是做出假设，具体实现可能有所不同。当一个提议被acceptors中的多数派接受之后，我们才认为它被选择了。这种情况下我们说提议已经被选择（包含value）。

We can allow multiple proposals to be chosen, but we must guarantee that all chosen proposals have the same value. By induction on the proposal number, it suffices to guarantee:

我们可以允许多个提议被选择，但是我们必须保证所有选择的提议必须具有相同的value。通过归纳proposal number，足以保证：

P2. If a proposal with value v is chosen, then every higher numbered proposal that is chosen has value v.

P2.如果一个value为v的提议被选择。那么后续更高编号的提议都应该包含被选择的value v。

Since numbers are totally ordered, condition P2 guarantees the crucial safety property that only a single value is chosen. To be chosen, a proposal must be accepted by at least one acceptor. So,we can satisfy P2 by satisfying:

由于编号是全局有序的，条件P2保证了只有一个value被选择的关键安全特性。为了被选择，提议至少被一个acceptor接受。所以我们可以通过满足以下条件来满足P2：

P2a. If a proposal with value v is chosen, then every higher numbered proposal accepted by any acceptor has value v.

P2a.如果一个value为v的提议被选择，那么acceptor接受的任何更高编号的提议都应该包含value为v的提议。

We still maintain P1 to ensure that some proposal is chosen. Because communication is asynchronous, a proposal could be chosen with some particular acceptor c never having received any proposal. Suppose a new proposer “wakes up” and issues a higher-numbered proposal with a different value.P1 requires c to accept this proposal, violating P2a. Maintaining both P1 and P2a requires strengthening P2a to:

我们仍然满足P1以保证某个提议能被选择。由于通信是异步的，一个提议可能被没有接受过任何提议的acceptor c选择。假设一个新的节点启动后，发送了一个更大编号但是value不同的提议。P1定理要求c接受这个提议，这就违反了P2a定理。为了同时满足P1和P2a，需要加强P2a：

P2b. If a proposal with value v is chosen, then every higher-numbered proposal issued by any proposer has value v.

P2b.如果一个value为v的提议被选择，那么每个proposer提出的更高编号的提议都应该包含value为v的提议。

Since a proposal must be issued by a proposer before it can be accepted by an acceptor, P2b implies P2a, which in turn implies P2.

由于提议只能是proposer提出之后，acceptor才能接受，因此P2b定理满足P2a定理，也就满足了P2定理。

To discover how to satisfy P2b, let’s consider how we would prove that it holds. We would assume that some proposal with number m and value v is chosen and show that any proposal issued with number n > m also has value v. We would make the proof easier by using induction on n,so we can prove that proposal number n has value v under the additional assumption that every proposal issued with a number in m . .(n − 1) has value v, where i . . j denotes the set of numbers from i through j. For the proposal numbered m to be chosen, there must be some set C consisting of a majority of acceptors such that every acceptor in C accepted it. Combining this with the induction assumption, the hypothesis that m is chosen implies:

为了明白如何满足P2b，让我们考虑如何证明它成立。我们假设编号为m，value为v的提议已经被选择，接下来我们来证明任何编号为n>m的提议都包含value为v的提议。我们可以通过归纳到n来简化证明，首先假设每个编号在m..(n-1)之间的提议都有value v，其中i...j代表从i到j的一组数字。既然有编号为m的提议被选择，必然存在一个由多数派acceptor组成的集合C，C中的每个acceptor都已经接受了。结合m被选中的归纳假设可以推出：

Every acceptor in C has accepted a proposal with number in m . .(n − 1), and every proposal with number in m . .(n − 1) accepted by any acceptor has value v.

C中的每一个acceptor都已经接受了编号从m到n-1的提议，acceptor接受的每一个编号从m到n-1的提议都包含value v。

Since any set S consisting of a majority of acceptors contains at least one member of C , we can conclude that a proposal numbered n has value v by ensuring that the following invariant is maintained:

因为由多数派acceptors组成的集合S与集合C之间至少存在一个交集，我们可以通过满足以下条件来确保编号为n的提议必然包含value v：

P2c. For any v and n, if a proposal with value v and number n is issued,then there is a set S consisting of a majority of acceptors such that either (a) no acceptor in S has accepted any proposal numbered less than n, or (b) v is the value of the highest-numbered proposal among all proposals numbered less than n accepted by the acceptors in S.

P2c.对于任意v和n，如果proposer发出一个value为v，编号为n的提议，那么存在一个由多数派acceptors组成的集合S，

(a)S中的任何acceptor都没有接受过编号小于n的提议；

(b)v是S中acceptor接受过的编号小于n的最大编号提议的values。

We can therefore satisfy P2b by maintaining the invariance of P2c.

我们可以通过满足P2c来满足P2b。

To maintain the invariance of P2c, a proposer that wants to issue a proposal numbered n must learn the highest-numbered proposal with number less than n, if any, that has been or will be accepted by each acceptor in some majority of acceptors. Learning about proposals already accepted is easy enough; predicting future acceptances is hard. Instead of trying to predict the future, the proposer controls it by extracting a promise that there won’t be any such acceptances. In other words, the proposer requests that the acceptors not accept any more proposals numbered less than n. This leads to the following algorithm for issuing proposals.

为了满足P2c，proposer想要提出编号为n的提议，必须首先获取到编号小于n的最大编号对应的提议，该最大编号提议已经被或者将要被多数派acceptor接受。获取已经被接受的提议是很容易的；但是预测未来将会被接受的提议是很难的。为了避免预测未来，proposer不提出不能被接受的提议。换句话说，proposer请求acceptor不要接受编号小于n的提议，这就导致了以下的提议生成算法。

1. A proposer chooses a new proposal number n and sends a request to each member of some set of acceptors, asking it to respond with:

proposer选择一个编号为n的提议，然后给每一个acceptor发送请求，要求acceptor作出如下回应：
(a) A promise never again to accept a proposal numbered less than n, and

(a)保证不再接受编号小于n的提议
(b) The proposal with the highest number less than n that it has accepted, if any.

(b)如果有的话，回应编号小于n的最大提议
I will call such a request a prepare request with number n.

我将这种请求称为编号为n的prepare请求。

2. If the proposer receives the requested responses from a majority of the acceptors, then it can issue a proposal with number n and value v, where v is the value of the highest-numbered proposal among the responses, or is any value selected by the proposer if the responders reported no proposals.

如果proposer收到了多数派acceptors的响应结果，那么它就可以发出编号为n，value为v的提议，这里的v是所有响应中编号最大提议的value，如果响应中不包含任何提议，则proposer可任意选择。

A proposer issues a proposal by sending, to some set of acceptors, a request that the proposal be accepted. (This need not be the same set of acceptors that responded to the initial requests.) Let’s call this an accept request.

proposer向acceptors发送接收提议的请求。（该集合不一定是之前响应请求的集合。）我们称这个请求为accept请求。

This describes a proposer’s algorithm. What about an acceptor? It can receive two kinds of requests from proposers: prepare requests and accept requests. An acceptor can ignore any request without compromising safety.So, we need to say only when it is allowed to respond to a request. It can always respond to a prepare request. It can respond to an accept request,accepting the proposal, if it has not promised not to. In other words:

目前我们只描述了proposer的算法，那么acceptor的呢？它可以从proposers接收两类请求：prepare请求和accept请求。acceptor可以忽略任何请求而不必担心影响正确性。所以，我们只需要说明acceptor什么情况下可以响应请求。acceptor可以在任何情况下响应prepare请求。acceptor可以在未拒绝的情况下响应请求，接受提议。换句话说：

P1a. An acceptor can accept a proposal numbered n if it has not responded to a prepare request having a number greater than n.

P1a.acceptor可以接受编号为n的提议，只要它还没有响应过编号大于n的prepare请求。

Observe that P1a subsumes P1.

可以看出P1a定理包含了P1定理。

We now have a complete algorithm for choosing a value that satisfies the required safety properties—assuming unique proposal numbers. The final algorithm is obtained by making one small optimization.

我们现在有了一个满足安全性需求的提议选择算法——假设提议编号唯一。再做一些小的优化，就得到了最终的算法。

Suppose an acceptor receives a prepare request numbered n, but it has already responded to a prepare request numbered greater than n, thereby promising not to accept any new proposal numbered n. There is then no reason for the acceptor to respond to the new prepare request, since it will not accept the proposal numbered n that the proposer wants to issue. So we have the acceptor ignore such a prepare request. We also have it ignore a prepare request for a proposal it has already accepted.

假设acceptor收到了一个编号为n的prepare请求，但是它已经对编号大于n的prepare请求做出了响应，因此承诺不再接受编号为n的新提议。它就没有必要响应这个心的prepare请求，因为它肯定不会接受proposer希望发出的编号为n的提议。因此我们会让acceptor忽略这样的prepare请求。我们也会让它忽略已经接受提议的prepare请求。

With this optimization, an acceptor needs to remember only the highest-numbered proposal that it has ever accepted and the number of the highest-numbered prepare request to which it has responded. Because P2c must be kept invariant regardless of failures, an acceptor must remember this information even if it fails and then restarts. Note that the proposer can always abandon a proposal and forget all about it—as long as it never tries to issue another proposal with the same number.

通过这个优化，acceptor只需要记住它已经接受的最大提议的编号以及已经响应的编号最大的prepare请求编号。即使在出错的情况下也需要保证P2c的不变性，acceptor必须记住这些信息，即使在出错或者重启的情况下。proposer可以丢失提议或者它所有的信息——只要它能保证不会再产生相同编号的提议。

Putting the actions of the proposer and acceptor together, we see that the algorithm operates in the following two phases.

把proposer和acceptor放在一起，我们可以得到算法的如下两阶段执行过程。

Phase 1.

阶段1.
(a) A proposer selects a proposal number n and sends a prepare request with number n to a majority of acceptors.

(a)proposer选择编号为n的提议，然后发送编号为n的prepare请求给多数派acceptors。

(b) If an acceptor receives a prepare request with number n greater than that of any prepare request to which it has already responded,then it responds to the request with a promise not to accept any more proposals numbered less than n and with the highest-numbered proposal (if any) that it has accepted.

(b)如果acceptor收到了一个编号为n的prepare请求，并且大于它所有已经响应的prepare请求，那么它就保证不再接受编号小于n的提议，并在回应中返回已经接受的最大编号提议。

Phase 2.

阶段2
(a) If the proposer receives a response to its prepare requests (numbered n) from a majority of acceptors, then it sends an accept request to each of those acceptors for a proposal numbered n with a value v, where v is the value of the highest-numbered proposal among the responses, or is any value if the responses reported no proposals.

(a)如果proposer收到了多数派acceptors对编号为n的prepare请求的回应，那么他就会对编号为n，value为v的提议，给每个acceptor发送accept请求，这里的v是收到的响应中编号最大提议的value，如果没有提议，则v可以是任意值。
(b) If an acceptor receives an accept request for a proposal numbered n, it accepts the proposal unless it has already responded to a prepare request having a number greater than n.vvvvvvvvvv,, ；

(b)如果acceptor收到了针对编号为n的提议的accept请求，只要它还未对编号大于n的提议做出响应，它就可以接受这个提议。

A proposer can make multiple proposals, so long as it follows the algorithm for each one. It can abandon a proposal in the middle of the protocol at any time. (Correctness is maintained, even though requests and/or responses for the proposal may arrive at their destinations long after the proposal was abandoned.) It is probably a good idea to abandon a proposal if some proposer has begun trying to issue a higher-numbered one. Therefore, if an acceptor ignores a prepare or accept request because it has already received a prepare request with a higher number, then it should probably inform the proposer, who should then abandon its proposal. This is a performance optimization that does not affect correctness.

一个proposer可以提出多个提议，只要每个提议都能满足算法的要求。它可以在协议的任何时候放弃提议。（即使提议的请求或响应在提议被放弃之后很长时间才到达目的地，也能保证正确性。）如果其他proposer已经开始提出更高编号的提议时，该proposer放弃当前提议是比较好的选择。因此，如果acceptor因为已经收到更高编号的prepare请求而忽略了其他的prepare或者accept请求，则应该通知对应的proposer放弃该提议。

2.3 Learning a Chosen Value

To learn that a value has been chosen, a learner must find out that a proposal has been accepted by a majority of acceptors. The obvious algorithm is to have each acceptor, whenever it accepts a proposal, respond to all learners, sending them the proposal. This allows learners to find out about a chosen value as soon as possible, but it requires each acceptor to respond to each learner—a number of responses equal to the product of the number of acceptors and the number of learners.

为了获取一个已经被选择的value，learner必须找到已经被多数派acceptor接受的提议。最明显的算法是让acceptor接受一个提议之后，就将提议发送给所有的learners。这能让learner尽快的找到被选择的value，但这需要每个acceptor对每个learner进行回复——响应消息的数量等于acceptors数量和learners数量的乘积。

The assumption of non-Byzantine failures makes it easy for one learner to find out from another learner that a value has been accepted. We can have the acceptors respond with their acceptances to a distinguished learner, which in turn informs the other learners when a value has been chosen. This approach requires an extra round for all the learners to discover the chosen value. It is also less reliable, since the distinguished learner could fail. But it requires a number of responses equal only to the sum of the number of acceptors and the number of learners.

不考虑拜占庭问题的情况下，可以让一个learner很容易的从另一个learner那获取到已经被接受的value。我们可以把acceptor的响应发送给一个特殊的learner，由这个learner再通知其他的learners有value被接受了。该方法需要额外的一轮来让所有的learners获取到被选择的value。同样也是不可靠的，因为这个特殊的learners可能故障。响应消息的数量等于acceptors数量和learners数量之和。

More generally, the acceptors could respond with their acceptances to some set of distinguished learners, each of which can then inform all the learners when a value has been chosen. Using a larger set of distinguished learners provides greater reliability at the cost of greater communication complexity.

更一般的情况，acceptors可以把响应消息发给一个特殊learners集合，它们中的任何一个都能在有value被选择的时候通知所有的learners。采用特殊learners集合以更多的通信复杂度为代价来换取更高的可靠性。

Because of message loss, a value could be chosen with no learner ever finding out. The learner could ask the acceptors what proposals they have accepted, but failure of an acceptor could make it impossible to know whether or not a majority had accepted a particular proposal. In that case, learners will find out what value is chosen only when a new proposal is chosen. If a learner needs to know whether a value has been chosen, it can have a proposer issue a proposal, using the algorithm described above.

由于消息丢失，可能learner不知道已经有value被选择了。learner可以询问acceptors它们接受了什么提议，但是acceptor故障可能让我们不知道是否有多数派已经接受了特定的提议。在这种情况下，learner只有在有新的提议被选择的时候才知道被选择的value是什么。如果learner想知道一个value是否被选择，它可以使用上面描述的算法，让proposer提出一个提议。

2.4 Progress

It’s easy to construct a scenario in which two proposers each keep issuing a sequence of proposals with increasing numbers, none of which are ever chosen. Proposer p completes phase 1 for a proposal number n1. Another proposer q then completes phase 1 for a proposal number n2 > n1. Proposer p’s phase 2 accept requests for a proposal numbered n1 are ignored because the acceptors have all promised not to accept any new proposal numbered less than n2. So, proposer p then begins and completes phase 1 for a new proposal number n3 > n2, causing the second phase 2 accept requests of proposer q to be ignored. And so on.

很容易构建这样一个场景，两个proposers不断的提出编号递增的一系列提议，但是没有一个会被选择。proposer p提出编号为n1的提议并完成阶段1。另外一个proposer q提出编号为n2的提议，其中n2>n1，并完成阶段1。proposer p编号为n1的阶段2的accept请求会被忽略，因为acceptors已经承诺不会接受任何编号小于n2的任何提议

paxos made simple 译文

猜你喜欢