raft:关于只读查询的一些问题

raft: some questions about read only queries

在 raft 的论文文档第 6.4 章中,它给出了绕过 Raft 日志进行只读查询并仍然保持线性化的步骤:

  1. If the leader has not yet marked an entry from its current term committed, it waits until it has done so. The Leader Completeness Property guarantees that a leader has all committed entries, but at the start of its term, it may not know which those are. To find out, it needs to commit an entry from its term. Raft handles this by having each leader commit a blank no-op entry into the log at the start of its term. As soon as this no-op entry is committed, the leader’s commit index will be at least as large as any other servers’ during its term.
  2. The leader saves its current commit index in a local variable readIndex. This will be used as a lower bound for the version of the state that the query operates against.
  3. The leader needs to make sure it hasn’t been superseded by a newer leader of which it is unaware. It issues a new round of heartbeats and waits for their acknowledgments from a majority of the cluster. Once these acknowledgments are received, the leader knows that there could not have existed a leader for a greater term at the moment it sent the heartbeats. Thus, the readIndex was, at the time, the largest commit index ever seen by any server in the cluster.
  4. The leader waits for its state machine to advance at least as far as the readIndex; this is current enough to satisfy linearizability.
  5. Finally, the leader issues the query against its state machine and replies to the client with the results.

我的问题:

a) step 1,是不是只针对刚选出leader时的情况?因为只有新领导者没有为当前任期提交的条目。并且由于空操作条目对于找出当前提交的条目是必要的,那么实际上在选择完成时总是需要这一步,但不仅特定于只读查询?换句话说,通常情况下,当领导者处于活动状态一段时间后,它必须为其任期提交条目(包括 no-op 条目)。

b) 对于第 3 步,是否意味着只要 leader 需要提供只读查询,那么就会发送一个额外的心跳,而不管当前未完成的心跳(已发送但尚未收到主要响应)或下一个预定心跳?

c) 对于第 4 步,是否仅针对关注者(对于关注者帮助卸载只读查询处理的情况)?因为在 leader 上,提交的索引已经意味着它被应用到本地状态机。

总而言之,一般情况下leader(active一段时间)只需要做step 3和step 5就可以了吧?

a:确实只有第一次选举leader时才会这样。实际上,当收到只读查询时,您检查是否已从领导者的当前任期和队列中提交了条目,如果没有则拒绝该查询。

b:在实践中,大多数实现批量只读查询以提高效率。您不需要发送许多并发心跳。如果心跳未完成,则领导者可以在该心跳完成后将任何新的读取加入队列以进行评估。一旦心跳完成,如果有任何额外的查询排队,那么领导者会启动另一个心跳。这具有批处理可线性化的只读查询以提高效率的效果。

c:领导者的lastApplied索引(其状态机的索引)总是不是是真的相当于它的commitIndex。事实上,这就是为什么 Raft 中首先有 一个 lastApplied 索引的原因。领导者不一定必须在提交该索引的同时同步应用该索引。这确实是特定于实现的。实际上,Raft 实现通常在不同的线程中应用条目。因此,可以提交一个条目,然后将其排队以应用到状态机。一些实现将条目放在队列中以应用于状态机,并允许状态机从该队列中 拉取 条目以按照状态机自己的步调应用,因此当条目可能被应用是未指定的。在领导者提交的最后一个命令之后应用只读查询非常重要。

另外,你问这是否只适用于粉丝。线性化查询只能通过领导者进行评估。我想有一些算法可以对追随者进行线性化读取,但效率很低。 Followers 只能维护查询的顺序一致性。在这种情况下,服务器会在评估操作时使用状态机的索引来响应客户端操作。客户端在每个操作中发送他们最后接收到的索引,当服务器接收到一个操作时,它使用相同的算法来确保其状态机的 lastApplied 索引至少与客户端的索引一样大。这是必要的,以确保客户端在切换服务器时不会及时看到状态返回。

如果您想支持来自单个客户端的并发操作的 FIFO 一致性,除了 Raft 文献中描述的内容之外,只读查询还有一些其他的复杂性。 Copycat 的 architecture documentation.

中描述了其中一些内容