KTables 在启动时如何与 Kafka 交互?

How do KTables interact with Kafka on startup?

我对它在概念上的工作方式有点困惑。

kafka streams 如何保证从 kafka broker 分配给它的分区与分配给其他主题的分区相匹配?似乎需要进行一些协调? 另外,kafka streams 是总是从头开始读取压缩的主题,还是从最新的偏移量读取?一旦它从压缩主题中读取消息,它是否提交偏移量?

How does kafka streams guarantee that the partitions assigned to it from the kafka broker match the partitions assigned for other topics?

Kafka 流应用程序在 application.id 下订阅一个或多个主题,类似于 Kafka 客户端中的 group.id

当客户端请求 Kafka 代理订阅具有特定 group.id 的主题时,它 returns 该主题的一组分区。 如果所有主题分区都分配给同一 application.id 下的任何流实例,将触发重新平衡,新启动的流实例将收到其分区份额,旧实例将不再监听那些分区。

Does kafka streams always read the compacted topic from the beginning, or does it read from latest offset?

无论压缩还是其他方式,Kafka 流式传输应用程序从最后提交的偏移量读取。

Once it reads the messages from the compacted topic, does it commit the offset?

来自wiki,据说..

Kafka Streams commit the current processing progress in regular intervals (parameter commit.interval.ms). If a commit is triggered, all state stores need to flush data to disk, i.e., all internal topics needs to get flushed to Kafka. Furthermore, all user topics get flushed, too. Finally, all current topic offsets are committed to Kafka. In case of failure and restart, the application can resume processing from its last commit point (providing at-least-once processing guarantees).

在编写 Kafka 流应用程序时,开发人员无需手动处理提交偏移量,因为它是由 Kafka 流在内部完成的。