Raft 是如何保证一个 leader 总能被选举出来的呢?

How does Raft guarantee that a leader can always be elected?

Raft paper 说:

Raft uses the voting process to prevent a candidate from winning an election unless its log contains all committed entries. A candidate must contact a majority of the cluster in order to be elected, which means that every committed entry must be present in at least one of those servers. If the candidate’s log is at least as up-to-date as any other log in that majority, then it will hold all the committed entries. The RequestVote RPC implements this restriction: the RPC includes information about the candidate’s log, and the voter denies its vote if its own log is more up-to-date than that of the candidate

但是,它如何保证总是 甚至 一个可选举的领导者(即与大多数集群一样最新的领导者)?

例如,假设我们有一个由三台服务器 A、B、C 组成的集群,其中 A 是领导者。第一个日志条目存储在 A 和 B 中,第二个日志条目存储在 A 和 C 中。然后 A 崩溃,B 和 C 尝试选举领导者。但是此时没有大多数(即 3 个中的 2 个)服务器同时具有第一个和第二个条目。因此,领导者选举似乎永远不会发生(除非 A 重新启动,但 Raft 应该对 3 个服务器中的 1 个故障具有弹性..)

论文定义了一个与这个场景相关的“日志匹配属性”:

• [..]
• If two entries in different logs have the same index and term, then the logs are identical in all preceding entries.

由于 A 和 C 都包含相同的第二个条目,因此 C 也必须包含第一个条目。这是有保证的,因为:

The second property is guaranteed by a simple consistency check performed by AppendEntries. When sending an AppendEntries RPC, the leader includes the index and term of the entry in its log that immediately precedes the new entries. If the follower does not find an entry in its log with the same index and term, then it refuses the new entries.

直到 C 拥有 B 拥有的条目,它才会拒绝进一步的追加。因此,在您的场景中的某个时刻,C 必须收到该条目才能最终接受来自 A 的较新条目。

因此,C 是 B 和 C 之间最新的。(它会拒绝 B 的领导投票。)