Kafka Streams stateStores 容错恰好一次？

Kafka Streams stateStores fault tolerance exactly once?

我们正在尝试使用 Kafka Streams 实现重复数据删除服务。大局是它将使用其 rocksDB 状态存储，以便在过程中检查现有密钥。

如果我错了请纠正我，但为了使这些 stateStores 也具有容错性，Kafka 流 API 将透明地复制 Kafka 主题（称为更改日志）内的 stateStore 中的值。这样，如果我们的服务挂了，另一个服务将能够根据在 Kafka 中找到的 changeLog 重建它的 stateStore。

但这让我想到了一个问题，这个“StateStore --> changelog”本身是exactly once吗？我的意思是，当服务更新其 stateStore 时，它也会以恰好一次的方式更新变更日志……？如果服务崩溃，另一个将承担负载，但我们可以确定它不会错过来自崩溃服务的 stateStore 更新吗？

此致，

雅尼克

简短的回答是肯定的。

使用事务 - 原子多分区写入 - Kafka Streams 确保，当执行偏移量提交时，状态存储也被刷新到代理上的更改日志主题。以上操作是原子操作，因此如果其中一个操作失败，应用程序将从先前的偏移位置重新处理消息。

您可以在下面的博客中阅读更多关于 exactly once 语义的内容 https://www.confluent.io/blog/enabling-exactly-kafka-streams/。有部分：How Kafka Streams Guarantees Exactly-Once Processing.

But it raises a question to my mind, do this " StateStore --> changelog" itself is exactly once ?

是的——正如其他人已经在这里所说的那样。当然，您必须通过配置参数 processing.guarantee 将您的应用程序配置为使用恰好一次语义，请参阅 https://kafka.apache.org/21/documentation/streams/developer-guide/config-streams.html#processing-guarantee（此 link 适用于 Apache Kafka 2.1）。

We're trying to achieve a deduplication service using Kafka Streams. The big picture is that it will use its rocksDB state store in order to check existing keys during process.

https://github.com/confluentinc/kafka-streams-examples/blob/5.1.0-post/src/test/java/io/confluent/examples/streams/EventDeduplicationLambdaIntegrationTest.java 上还有一个事件重复数据删除示例应用程序。 links 指向 Confluent Platform 5.1.0 的 repo 分支，它使用 Apache Kafka 2.1.0 = 现在可用的最新版本的 Kafka。

Kafka Streams stateStores 容错恰好一次？

Kafka Streams stateStores fault tolerance exactly once?

fault-tolerance

apache-kafka

apache-kafka-streams