领导者何时在 DistributedLog 中确认客户端?

When does a leader ACKs client in DistributedLog?

我很难理解领导者何时真正确认客户。这是 DistributedLog documentation:

的一部分

Each batched entry appended to a log segment will be assigned a monotonically increasing entry id by the log segment writer. All the entries are written asynchronously in a pipeline. The log segment writer therefore updates an in-memory pointer, called LAP (LastAddPushed), which is the entry id of the last batched entry pushed to log segment store by the writer. The entries could be written out of order but only be acknowledged in entry id order. Along with the successful acknowledges, the log segment writer also updates an in-memory pointer, called LAC (LastAddConfirmed). LAC is the entry id of the last entry that already acknowledged by the writer. All the entries written between LAC and LAP are unacknowledged data, which they are not visible to readers.

The readers can read entries up to LAC as those entries are known to be durably replicated - thereby can be safely read without the risk of violating read ordering. The writer includes the current LAC in each entry that it sends to BookKeeper. Therefore each subsequent entry makes the records in the previous entry visible to the readers. LAC updates can be piggybacked on the next entry that are written by the writer. Since readers are strictly followers, they can leverage LAC to read durable data from any of the replicas without need for any communication or coordination with the writer.

DL introduces one type of system record, which is called control record - it acts as the commit request in two-phases-commit algorithm. If no application records arrive within the specified SLA, the writer will generate a control record. With writing the control record, it would advance the LAC of the log stream. The control record is added either immediately after receiving acknowledges from writing a user record or periodically if no application records are added. It is configured as part of writer's flushing policy. While control log records are present in the physical log stream, they are not delivered by the log readers to the application.

现在考虑以下场景:

  1. Leader 发布消息给 Bookkeeper
  2. 跟随者获取消息,追加到日志中并将 ACK 发送给领导者
  3. Leader 得到follower 的确认,增加LAC 和 回复客户端消息已提交。
  4. 现在:领导者在它可以搭载到 LAC 的追随者之前失败了 已递增。
  5. 问题是:由于潜在的领导不知道这个事实 LAC 已经增加,它成为新的领导者并且 将日志截断为旧 LAC,这意味着我们丢失了一个条目 已被前任leader确认过的日志

结果客户端已经确认消息写入成功,但是已经丢失了

Since potential leader is not aware of the fact that LAC has been incremented, it becomes the new leader and truncates the log to old LAC, which means we have lost an entry in the log that has been confirmed by previous leader.

有几种情况:

1) 如果leader优雅地关闭日志,它会密封它正在写入的日志段。 LAC 将被提前,它也将被记录为日志段元数据的一部分(存储在元数据存储中)。

2) 如果领导者崩溃并且没有优雅地关闭日志,一个潜在的领导者出现,它将经历 recovery process。新领导人的工作将是:

  • a) 它会尝试封存上一个leader写的最后一个日志段。 seal 过程由 bookkeeper client 完成,包括两部分: (a) fence 日志段。 fencing 强制在该日志段中不再发生写入。 (b) 然后它将从最后一个已知的 LAC 进行前向恢复并恢复已写入但尚未提交的条目。

  • b) 恢复最后一个日志段后,新的leader会开辟一个新的日志段写入条目。

希望这能解释您的问题。

DistributedLog还有一篇论文发表在ICDE 2017,大家可以从here获取。