在 KAFKA 中的最大轮询间隔之前发布新事件后跳过滞后偏移量
Lagged offsets skipped after new event is published before max poll interval in KAFKA
Kafka v2.4 消费者配置:-
kafka.consumer.auto.offset.reset=earliest
kafka.consumer.auto.commit=false
Kafka 消费者容器配置:-
@Bean
public ConcurrentKafkaListenerContainerFactory<String, PayoutDto> kafkaPayoutStatusPoolListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, PayoutDto> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(kafkaConsumerFactoryForPayoutEvent());
factory.getContainerProperties().setAckMode(AckMode.MANUAL_IMMEDIATE);
factory.setMissingTopicsFatal(false);
return factory;
}
卡夫卡消费者:-
@KafkaListener(id = "regularPayoutEventConsumer", topics = "${kafka.regular.payout.consumer.queuename}", containerFactory = "kafkaPayoutStatusPoolListenerContainerFactory", groupId = "${kafka.regular.payout.consumer.groupId}")
public void listen(ConsumerRecord<String, PayoutDto> consumerRecord, Acknowledgment ack) {
StopWatch watch = new StopWatch();
watch.start();
String key = null;
Long offset = null;
try {
PayoutDto payoutDto = consumerRecord.value();
key = consumerRecord.key();
offset = consumerRecord.offset();
cpAccountsService.processPayoutEvent(payoutDto);
ack.acknowledge();
} catch (Exception e) {
log.error("Exception occured in RegularPayoutEventConsumer due to following issue {}", e);
} finally {
watch.stop();
log.debug("tolal time taken by consumer for requestID:" + key + " on offset:" + offset + " is:"
+ watch.getTotalTimeMillis());
}
}
成功场景:-
- 消费者未能确认导致延迟的异常,假设最后提交的偏移量是 30,现在延迟是 4。
- 在轮询间隔后的下一个自动轮询周期中,消费者继续消费,其中延迟从 30 正常开始到 33 结束,现在延迟为 0。
失败场景:-
- 与成功场景中的第 1 步相同。
- 现在在消费者轮询间隔之前,生产者推送了新消息。
- 现在在新的生产者事件上,消费者拉取数据并直接跳转到偏移记录 33 并且 跳过 30,31,32 并将延迟清除为 0。
kafka应用启动日志:-
2021-04-14 10:38:06.132 INFO 10286 --- [ restartedMain] o.a.k.clients.consumer.KafkaConsumer : [Consumer clientId=consumer-RegularPayoutEventGroupId-3, groupId=RegularPayoutEventGroupId] Subscribed to topic(s): InstantPayoutTransactionsEv
2021-04-14 10:38:06.132 INFO 10286 --- [ restartedMain] o.s.s.c.ThreadPoolTaskScheduler : Initializing ExecutorService
2021-04-14 10:38:06.133 INFO 10286 --- [ restartedMain] o.a.k.clients.consumer.ConsumerConfig : ConsumerConfig values:
allow.auto.create.topics = true
auto.commit.interval.ms = 5000
auto.offset.reset = earliest
bootstrap.servers = [localhost:9092]
check.crcs = true
client.dns.lookup = use_all_dns_ips
client.id = consumer-PayoutEventGroupId-4
client.rack =
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = PayoutEventGroupId
group.instance.id = null
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = true
internal.throw.on.fetch.stable.offset.unsupported = false
isolation.level = read_uncommitted
key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 30000
max.poll.records = 500
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
security.providers = null
send.buffer.bytes = 131072
session.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.3]
ssl.endpoint.identification.algorithm = https
ssl.engine.factory.class = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLSv1.3
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
value.deserializer = class com.cms.cpa.config.KafkaPayoutDeserializer
2021-04-14 10:38:06.137 INFO 10286 --- [ restartedMain] o.a.kafka.common.utils.AppInfoParser : Kafka version: 2.6.0
2021-04-14 10:38:06.137 INFO 10286 --- [ restartedMain] o.a.kafka.common.utils.AppInfoParser : Kafka commitId: 62abe01bee039651
Kafka 为 consumer/partition 维护 2 个值 - 提交的偏移量(如果重新启动,消费者将在此处开始)和 position
- 下一次轮询将返回哪个记录。
不确认记录不会导致位置被重新定位。
它正在按设计工作;如果你想重新处理一个失败的记录,你需要使用 acknowledgment.nack()
和一个可选的休眠时间,或者抛出异常并配置一个 SeekToCurrentErrorHandler
.
在这些情况下,容器将重新定位分区,以便重新传送失败的记录。使用错误处理程序,您可以在重试次数耗尽后“恢复”失败的记录。使用 nack()
时,侦听器必须跟踪尝试。
见https://docs.spring.io/spring-kafka/docs/current/reference/html/#committing-offsets
和https://docs.spring.io/spring-kafka/docs/current/reference/html/#annotation-error-handling
Kafka v2.4 消费者配置:-
kafka.consumer.auto.offset.reset=earliest
kafka.consumer.auto.commit=false
Kafka 消费者容器配置:-
@Bean
public ConcurrentKafkaListenerContainerFactory<String, PayoutDto> kafkaPayoutStatusPoolListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, PayoutDto> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(kafkaConsumerFactoryForPayoutEvent());
factory.getContainerProperties().setAckMode(AckMode.MANUAL_IMMEDIATE);
factory.setMissingTopicsFatal(false);
return factory;
}
卡夫卡消费者:-
@KafkaListener(id = "regularPayoutEventConsumer", topics = "${kafka.regular.payout.consumer.queuename}", containerFactory = "kafkaPayoutStatusPoolListenerContainerFactory", groupId = "${kafka.regular.payout.consumer.groupId}")
public void listen(ConsumerRecord<String, PayoutDto> consumerRecord, Acknowledgment ack) {
StopWatch watch = new StopWatch();
watch.start();
String key = null;
Long offset = null;
try {
PayoutDto payoutDto = consumerRecord.value();
key = consumerRecord.key();
offset = consumerRecord.offset();
cpAccountsService.processPayoutEvent(payoutDto);
ack.acknowledge();
} catch (Exception e) {
log.error("Exception occured in RegularPayoutEventConsumer due to following issue {}", e);
} finally {
watch.stop();
log.debug("tolal time taken by consumer for requestID:" + key + " on offset:" + offset + " is:"
+ watch.getTotalTimeMillis());
}
}
成功场景:-
- 消费者未能确认导致延迟的异常,假设最后提交的偏移量是 30,现在延迟是 4。
- 在轮询间隔后的下一个自动轮询周期中,消费者继续消费,其中延迟从 30 正常开始到 33 结束,现在延迟为 0。
失败场景:-
- 与成功场景中的第 1 步相同。
- 现在在消费者轮询间隔之前,生产者推送了新消息。
- 现在在新的生产者事件上,消费者拉取数据并直接跳转到偏移记录 33 并且 跳过 30,31,32 并将延迟清除为 0。
kafka应用启动日志:-
2021-04-14 10:38:06.132 INFO 10286 --- [ restartedMain] o.a.k.clients.consumer.KafkaConsumer : [Consumer clientId=consumer-RegularPayoutEventGroupId-3, groupId=RegularPayoutEventGroupId] Subscribed to topic(s): InstantPayoutTransactionsEv
2021-04-14 10:38:06.132 INFO 10286 --- [ restartedMain] o.s.s.c.ThreadPoolTaskScheduler : Initializing ExecutorService
2021-04-14 10:38:06.133 INFO 10286 --- [ restartedMain] o.a.k.clients.consumer.ConsumerConfig : ConsumerConfig values:
allow.auto.create.topics = true
auto.commit.interval.ms = 5000
auto.offset.reset = earliest
bootstrap.servers = [localhost:9092]
check.crcs = true
client.dns.lookup = use_all_dns_ips
client.id = consumer-PayoutEventGroupId-4
client.rack =
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = PayoutEventGroupId
group.instance.id = null
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = true
internal.throw.on.fetch.stable.offset.unsupported = false
isolation.level = read_uncommitted
key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 30000
max.poll.records = 500
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
security.providers = null
send.buffer.bytes = 131072
session.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.3]
ssl.endpoint.identification.algorithm = https
ssl.engine.factory.class = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLSv1.3
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
value.deserializer = class com.cms.cpa.config.KafkaPayoutDeserializer
2021-04-14 10:38:06.137 INFO 10286 --- [ restartedMain] o.a.kafka.common.utils.AppInfoParser : Kafka version: 2.6.0
2021-04-14 10:38:06.137 INFO 10286 --- [ restartedMain] o.a.kafka.common.utils.AppInfoParser : Kafka commitId: 62abe01bee039651
Kafka 为 consumer/partition 维护 2 个值 - 提交的偏移量(如果重新启动,消费者将在此处开始)和 position
- 下一次轮询将返回哪个记录。
不确认记录不会导致位置被重新定位。
它正在按设计工作;如果你想重新处理一个失败的记录,你需要使用 acknowledgment.nack()
和一个可选的休眠时间,或者抛出异常并配置一个 SeekToCurrentErrorHandler
.
在这些情况下,容器将重新定位分区,以便重新传送失败的记录。使用错误处理程序,您可以在重试次数耗尽后“恢复”失败的记录。使用 nack()
时,侦听器必须跟踪尝试。
见https://docs.spring.io/spring-kafka/docs/current/reference/html/#committing-offsets
和https://docs.spring.io/spring-kafka/docs/current/reference/html/#annotation-error-handling