Paxos是如何处理丢包和新节点加入的？

How does Paxos handle packet loss and new node joining?

最近在学习Paxos，到现在已经对它的工作原理有了基本的了解。但是谁能解释一下 Paxos 是如何处理丢包和新节点加入的？如果能提供一个简单的例子就更好了

关于丢包，Paxos使用下一个关于网络的假设：

Messages may be lost, reordered, or duplicated.

这是通过 quorums 解决的。所有 Acceptor 中至少有 X 必须接受一个值，系统才能接受它。这也解决了节点失败时的问题。

关于新节点加入，Paxos并不关注节点如何检测其他节点。这是其他算法解决的问题。

They automagically know all the nodes and each one's role

如果需要，对于生产代码实现，可以使用Zookeeper来解决这个新节点检测。

经典的Paxos算法没有"new nodes joining"的概念。 Paoxs 的一些变体可以，例如 "Vertical Paxos"，但是经典算法要求在运行算法之前静态定义所有节点。关于丢包，Paxos 使用了一个非常简单的无限循环："try a round of the algorithm, if anything at all goes wrong, try another round"。因此，如果在第一次尝试实现解决方案时丢失了太多数据包（可以通过等待回复的简单超时来检测），则可以尝试第二轮。如果该轮的超时到期，请重试，依此类推。

Paxos 算法未定义具体如何检测和处理丢包。这是一个特定于实现的细节。这对于生产环境来说实际上是一件好事，因为如何处理它会对基于 Paxos 的系统产生相当大的性能影响。

正如其他答案中所指出的，消息丢失或消息重新排序由算法处理：它旨在准确处理这些情况。

新节点加入是"cluster membership changes"的事情。有一个普遍的误解，认为 Paxos 不涵盖集群成员更改；然而，它们在 2001 年的论文 Paxos Made Simple in the last paragraph. In this blog post I discuss it. There is a question of how a new node gets a copy of all the state when it joins the cluster. That is discussed in 中有所描述。

Paxos是如何处理丢包和新节点加入的？

How does Paxos handle packet loss and new node joining?

algorithm

protocols

distributed-system

paxos