Kafka Streams 中的一项任务何时可以有多个输入分区？

Question

给定 :

the maximum number of partitions over all topics determines the number of tasks.

和 AbstractTask 的代码包含以下行：

final Set<TopicPartition> inputPartitions

我想知道什么时候（如果有的话）一个任务可以分配多个分区？

Answer 1

例如，如果您执行 join()、merge() 或 copartition()，任务将有多个输入分区。此外，如果您通过模式订阅一次阅读多个主题。

正交于

the maximum number of partitions over all topics determines the number of tasks

报价是关于创建任务的数量，与每个任务的分区数量无关。

假设您有两个输入主题 A 和 2 个分区（A-0 和 A-1）和 B 只有一个分区（B-0）。您的程序是：

KStream a = builder.stream("A",...);
KStream b = builder.stream("B",...);
a.merge(b);

你的程序逻辑上是：

topic-A ---+
           +---> merge()
topic-B ---+

对于这种情况，您将获得两个任务：

Task 0_0:

A-0 ---+
       +--- merge() -->
B-0 ---+

Task 0_1:

A-1 ---+
       +--- merge() -->

注意第二个任务0_1只有一个输入分区，因为主题B只有1个分区。

任务基本上是您的（逻辑）程序的一个副本（物理实例化），该程序处理名称为分区号的所有分区。因为topic-A有两个partition，所以需要创建两个task

When could a task in Kafka Streams have more than one input partition?