即使生产者得到确认,卡夫卡也会发生消息丢失吗?

Can a message loss occur in Kafka even if producer gets acknowledgement for it?

Kafka doc 说:

  • Kafka relies heavily on the filesystem for storing and caching messages.
  • A modern operating system provides read-ahead and write-behind techniques that prefetch data in large block multiples and group smaller logical writes into large physical writes.
  • Modern operating systems have become increasingly aggressive in their use of main memory for disk caching. A modern OS will happily divert all free memory to disk caching with little performance penalty when the memory is reclaimed. All disk reads and writes will go through this unified cache
  • ...rather than maintain as much as possible in-memory and flush it all out to the filesystem in a panic when we run out of space, we invert that. All data is immediately written to a persistent log on the filesystem without necessarily flushing to disk. In effect this just means that it is transferred into the kernel's pagecache.”

进一步 this article 说:

(3) a message is ‘committed’ when all in sync replicas have applied it to their log, and (4) any committed message will not be lost, as long as at least one in sync replica is alive.

所以即使我用 acks=all 配置生产者(这会导致生产者在所有代理提交消息后收到确认)并且生产者收到某些消息的确认,这是否意味着他们仍然有可能收到该消息可能会迷路,特别是如果所有代理都关闭并且 OS 从不将提交的消息缓存刷新到磁盘?

使用 acks=all 并且如果主题的复制因子 > 1,仍然有可能丢失已确认的消息,但可能性很小。

例如,如果您有 3 个副本(并且全部同步),使用 acks=all,您需要同时丢失所有 3 个代理,然后他们中的任何一个才有时间做实际写入磁盘。使用 acks=all,一旦所有同步副本都收到消息,就会发送确认,例如,您可以使用 min.insync.replicas=2 确保此数字保持高位。

如果您使用 rack awareness feature,您可以进一步降低这种情况的可能性(显然代理位于不同的机架或更好的数据中心)。

总而言之,使用所有这些选项,您可以充分降低丢失数据的可能性,使其永远不会发生。