ISR Set(3) 的 Kafka 大小不足 min.isr 2

Kafka Size of ISR Set(3) insufficient for min.isr 2

在 Apache Kafka 2.6 中使用 MirrorMaker 1 镜像数据时出现奇怪的 Kafka 服务器错误。

org.apache.kafka.common.errors.NotEnoughReplicasException: The size of the current ISR Set(3) is insufficient to satisfy the min.isr requirement of 2 for partition FooBar-0

奇怪的是,min.isr设置为2,ISR Set有3个节点。 尽管如此,我还是得到了 NotEnoughReplicasException 异常。

同样深入研究该主题并没有表现出任何好奇心

[root@LoremIpsum kafka]# /usr/lib/kafka/kafka/bin/kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic FooBar
Topic: FooBar       PartitionCount: 1       ReplicationFactor: 3    Configs: min.insync.replicas=2,cleanup.policy=compact,segment.bytes=1073741824,max.message.bytes=5242880,min.compaction.lag.ms=604800000,message.timestamp.type=LogAppendTime,unclean.leader.election.enable=false
        Topic: FooBar       Partition: 0    Leader: 3       Replicas: 2,3,1 Isr: 3

3个节点的日志看起来都正常(据我判断)。是否有任何其他原因可能会产生此消息。还有什么可以检查的?

非常感谢您的任何建议!


消费者配置

exclude.internal.topics=true
auto.offset.reset=earliest
enable.auto.commit=false
isolation.level=read_committed
partition.assignment.strategy=org.apache.kafka.clients.consumer.RoundRobinAssignor
max.partition.fetch.bytes=5242880

生产者配置

acks=all
enable.idempotence=true
max.in.flight.requests.per.connection=1
#retries=
#delivery.timeout.ms=
#request.timeout.ms=
#linger.ms
batch.size=1000
max.request.size=5242880

术语“ISR Set(3)”表示只有代理 #3 是 in-sync。这在 kafka-topics 命令的输出中也可见。显然,代理之间的数据复制出了点问题。

在 MirrorMaker1 的掩护下,有一个普通的 KafkaConsumer 和 KafkaProducer 来完成这项工作。根据 Producer Callback 的 JavaDocs,NotEnoughReplicasException 是一个 可重试 异常。

因此,您可以通过设置以下生产者配置来消除此错误:

acks=all: The number of acknowledgments the producer requires the leader to have received before considering a request complete. This controls the durability of records that are sent. 
retry.backoff.ms=1000: The amount of time to wait before attempting to retry a failed request to a given topic partition. This avoids repeatedly sending requests in a tight loop under some failure scenarios.
delivery.timeout.ms=300000: An upper bound on the time to report success or failure after a call to send() returns. This limits the total time that a record will be delayed prior to sending, the time to await acknowledgement from the broker (if expected), and the time allowed for retriable send failures.

有关 KafkaProducer 配置的所有详细信息已给出 here