Kafka 主题的理想分区数

Ideal number of partitions for Kafka topic

我目前正在处理一个有 6 个 kafka-brokers 的设置,数据正以每秒大约 4000 条消息的速度从两个 producers 推送到我的 topic,我有 5 Consumers 人针对此主题作为一个小组工作。我的 kafka topic 的理想分区数应该是多少?

如果 brokers/consumers/producers 也需要任何更改,请随时告诉我。

一般来说,分区越多,吞吐量就越大。但是,还有其他考虑因素,例如您 运行 所用硬件的限制,是否使用压缩等。Confluent here 提供了足够好的信息,可让您深入了解可以使用的粗略计算得出分区数。

A rough formula for picking the number of partitions is based on throughput. You measure the throughout that you can achieve on a single partition for production (call it p) and consumption (call it c). Let’s say your target throughput is t. Then you need to have at least max(t/p, t/c) partitions. The per-partition throughput that one can achieve on the producer depends on configurations such as the batching size, compression codec, type of acknowledgement, replication factor, etc.

此外对于消费者

The consumer throughput is often application dependent since it corresponds to how fast the consumer logic can process each message

所以最好的方法是针对您自己的用例进行衡量和基准测试