Kafka Streaming 1.0 为线程分配分区
Kafka Streaming 1.0 Assign partition to thread
我正在使用具有无状态简单处理器拓扑的 Kafka Streaming。
我有一个有 100 个分区的主题,有 2 台机器,每台机器有 50 个线程,运行 相同的流应用程序,所以最终我会有一个 1-1 映射。
主题中的消息已经是键控消息。
我有一个逻辑约束,一旦一个线程挂接到一个或多个分区,它应该继续处理这些分区(当然,直到重新启动,它会重新洗牌)
我从日志中看到线程重复(重新)加入消费者组。
我的问题,做 kafka-streaming api 保证线程处理它最初在应用程序启动时挂钩的相同分区,或者它不时地重新洗牌。
我查看了文档,但找不到任何详细讨论此问题的内容。
这是我使用的代码:
StreamsBuilder builder = new StreamsBuilder();
KStream<String, Document> topicStreams = builder.stream(sourceTopic);
topicStreams.process(() -> new CustomMsgProcessor());
KafkaStreams streams = new KafkaStreams(builder.build(), config);
streams.start();
My question, Do kafka-streaming api guarantee the thread to process the same partition(s) it was originally hooked on the app startup, or it do reshuffelling every now and then.
当您的 Streams 应用程序启动时,it builds the number of threads you specified and sets the threads to listen for partition assignment. When partitions are assigned to a thread, it creates tasks for those partitions. When those partitions receive input, the thread processes the input with their respective tasks。
因此,如果我正确理解了您的限制,given the lifecycle of the Streams application,是的:一旦线程收到分区分配,它将继续处理该分区,直到它关闭或重新平衡。
在 Architecture 中说:
Kafka Streams creates a fixed number of stream tasks based on the input stream partitions for the application, with each task being assigned a list of partitions from the input streams (i.e., Kafka topics). The assignment of stream partitions to stream tasks never changes
我正在使用具有无状态简单处理器拓扑的 Kafka Streaming。
我有一个有 100 个分区的主题,有 2 台机器,每台机器有 50 个线程,运行 相同的流应用程序,所以最终我会有一个 1-1 映射。
主题中的消息已经是键控消息。
我有一个逻辑约束,一旦一个线程挂接到一个或多个分区,它应该继续处理这些分区(当然,直到重新启动,它会重新洗牌)
我从日志中看到线程重复(重新)加入消费者组。
我的问题,做 kafka-streaming api 保证线程处理它最初在应用程序启动时挂钩的相同分区,或者它不时地重新洗牌。
我查看了文档,但找不到任何详细讨论此问题的内容。
这是我使用的代码:
StreamsBuilder builder = new StreamsBuilder();
KStream<String, Document> topicStreams = builder.stream(sourceTopic);
topicStreams.process(() -> new CustomMsgProcessor());
KafkaStreams streams = new KafkaStreams(builder.build(), config);
streams.start();
My question, Do kafka-streaming api guarantee the thread to process the same partition(s) it was originally hooked on the app startup, or it do reshuffelling every now and then.
当您的 Streams 应用程序启动时,it builds the number of threads you specified and sets the threads to listen for partition assignment. When partitions are assigned to a thread, it creates tasks for those partitions. When those partitions receive input, the thread processes the input with their respective tasks。
因此,如果我正确理解了您的限制,given the lifecycle of the Streams application,是的:一旦线程收到分区分配,它将继续处理该分区,直到它关闭或重新平衡。
在 Architecture 中说:
Kafka Streams creates a fixed number of stream tasks based on the input stream partitions for the application, with each task being assigned a list of partitions from the input streams (i.e., Kafka topics). The assignment of stream partitions to stream tasks never changes