Spring webflux 应用中的 Reactor Kafka 健康检查

Reactor Kafka health check in a Spring webflux app

我有一个 Reactor Kafka 应用程序,它无限期地使用来自主题的消息。我需要公开一个健康检查 REST 端点,它可以指示此过程的健康状况——主要是想知道 Kafka 接收器通量序列是否已终止,以便可以采取一些措施来启动它。有没有办法知道通量的当前状态(completed/terminated 等)?应用是SpringWebflux + Reactor Kafka.

编辑 1 - doOnTerminate/doFinally 不执行

        Flux.range(1, 5)
                .flatMap(record -> Mono.just(record)
                        .map(i -> {
                                throw new OutOfMemoryError("Forcing exception for " + i);
                        })
                        .doOnNext(i -> System.out.println("doOnNext: " + i))
                        .doOnError(e -> System.err.println(e))
                        .onErrorResume(e -> Mono.empty()))
                .doFinally(signalType -> System.err.println("doFinally: Terminating with Signal type: " + signalType))
                .doOnTerminate(()-> System.err.println("doOnTerminate: executed"))
                .subscribe();
"C:\Program Files\Java\jdk1.8.0_211\bin\java.exe" "-javaagent:C:\Program Files\JetBrains\IntelliJ IDEA Community Edition 2019.2.4\lib\idea_rt.jar=52295:C:\Program Files\JetBrains\IntelliJ IDEA Community Edition 2019.2.4\bin" -Dfile.encoding=UTF-8 -classpath "C:\Program Files\Java\jdk1.8.0_211\jre\lib\charsets.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\deploy.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\access-bridge-64.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\cldrdata.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\dnsns.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\jaccess.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\jfxrt.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\localedata.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\nashorn.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\sunec.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\sunjce_provider.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\sunmscapi.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\sunpkcs11.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\zipfs.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\javaws.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\jce.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\jfr.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\jfxswt.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\jsse.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\management-agent.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\plugin.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\resources.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\rt.jar;C:\Users\akoul680\intellij-workspace\basics\target\classes;C:\Users\akoul680\.m2\repository\com\zaxxer\HikariCP.4.1\HikariCP-3.4.1.jar;C:\Users\akoul680\.m2\repository\org\apache\kafka\kafka-clients.2.0\kafka-clients-2.2.0.jar;C:\Users\akoul680\.m2\repository\com\github\luben\zstd-jni.3.8-1\zstd-jni-1.3.8-1.jar;C:\Users\akoul680\.m2\repository\org\lz4\lz4-java.5.0\lz4-java-1.5.0.jar;C:\Users\akoul680\.m2\repository\org\xerial\snappy\snappy-java.1.7.2\snappy-java-1.1.7.2.jar;C:\Users\akoul680\.m2\repository\org\apache\avro\avro.9.0\avro-1.9.0.jar;C:\Users\akoul680\.m2\repository\com\fasterxml\jackson\core\jackson-core.9.8\jackson-core-2.9.8.jar;C:\Users\akoul680\.m2\repository\com\fasterxml\jackson\core\jackson-databind.9.8\jackson-databind-2.9.8.jar;C:\Users\akoul680\.m2\repository\com\fasterxml\jackson\core\jackson-annotations.9.0\jackson-annotations-2.9.0.jar;C:\Users\akoul680\.m2\repository\org\apache\commons\commons-compress.18\commons-compress-1.18.jar;C:\Users\akoul680\.m2\repository\com\codahale\metrics\metrics-core.0.2\metrics-core-3.0.2.jar;C:\Users\akoul680\.m2\repository\org\junit\jupiter\junit-jupiter-api.3.2\junit-jupiter-api-5.3.2.jar;C:\Users\akoul680\.m2\repository\org\apiguardian\apiguardian-api.0.0\apiguardian-api-1.0.0.jar;C:\Users\akoul680\.m2\repository\org\opentest4j\opentest4j.1.1\opentest4j-1.1.1.jar;C:\Users\akoul680\.m2\repository\org\junit\platform\junit-platform-commons.3.2\junit-platform-commons-1.3.2.jar;C:\Users\akoul680\.m2\repository\org\slf4j\slf4j-api.7.26\slf4j-api-1.7.26.jar;C:\Users\akoul680\.m2\repository\ch\qos\logback\logback-core.2.3\logback-core-1.2.3.jar;C:\Users\akoul680\.m2\repository\ch\qos\logback\logback-classic.2.3\logback-classic-1.2.3.jar;C:\Users\akoul680\.m2\repository\io\projectreactor\reactor-core.4.10\reactor-core-3.4.10.jar;C:\Users\akoul680\.m2\repository\org\reactivestreams\reactive-streams.0.3\reactive-streams-1.0.3.jar;C:\Users\akoul680\.m2\repository\io\projectreactor\reactor-test.4.10\reactor-test-3.4.10.jar;C:\Users\akoul680\.m2\repository\commons-net\commons-net.6\commons-net-3.6.jar;C:\Users\akoul680\.m2\repository\com\box\box-java-sdk.32.0\box-java-sdk-2.32.0.jar;C:\Users\akoul680\.m2\repository\com\eclipsesource\minimal-json\minimal-json[=12=].9.1\minimal-json-0.9.1.jar;C:\Users\akoul680\.m2\repository\org\bitbucket\b_c\jose4j[=12=].4.4\jose4j-0.4.4.jar;C:\Users\akoul680\.m2\repository\org\bouncycastle\bcprov-jdk15on.52\bcprov-jdk15on-1.52.jar;C:\Users\akoul680\.m2\repository\com\jcraft\jsch[=12=].1.55\jsch-0.1.55.jar;C:\Users\akoul680\.m2\repository\org\apache\commons\commons-vfs2.4\commons-vfs2-2.4.jar;C:\Users\akoul680\.m2\repository\commons-logging\commons-logging.2\commons-logging-1.2.jar;C:\Users\akoul680\.m2\repository\org\bouncycastle\bcpkix-jdk15on.52\bcpkix-jdk15on-1.52.jar;C:\Users\akoul680\intellij-workspace\basics\lib\db2jcc4.jar" lrn.chapter14.ErrorHandling
2021-10-12T09:53:34,344 main r.util.Loggers - Using Slf4j logging framework
Exception in thread "main" java.lang.OutOfMemoryError: Forcing exception for 1
    at lrn.chapter14.ErrorHandling.lambda$null[=12=](ErrorHandling.java:19)
    at reactor.core.publisher.FluxMapFuseable$MapFuseableConditionalSubscriber.onNext(FluxMapFuseable.java:281)
    at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2398)
    at reactor.core.publisher.FluxMapFuseable$MapFuseableConditionalSubscriber.request(FluxMapFuseable.java:354)
    at reactor.core.publisher.FluxPeekFuseable$PeekFuseableConditionalSubscriber.request(FluxPeekFuseable.java:437)
    at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.request(MonoPeekTerminal.java:139)
    at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.set(Operators.java:2194)
    at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onSubscribe(FluxOnErrorResume.java:74)
    at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.onSubscribe(MonoPeekTerminal.java:152)
    at reactor.core.publisher.FluxPeekFuseable$PeekFuseableConditionalSubscriber.onSubscribe(FluxPeekFuseable.java:471)
    at reactor.core.publisher.FluxMapFuseable$MapFuseableConditionalSubscriber.onSubscribe(FluxMapFuseable.java:263)
    at reactor.core.publisher.MonoJust.subscribe(MonoJust.java:55)
    at reactor.core.publisher.Mono.subscribe(Mono.java:4361)
    at reactor.core.publisher.FluxFlatMap$FlatMapMain.onNext(FluxFlatMap.java:426)
    at reactor.core.publisher.FluxRange$RangeSubscription.slowPath(FluxRange.java:156)
    at reactor.core.publisher.FluxRange$RangeSubscription.request(FluxRange.java:111)
    at reactor.core.publisher.FluxFlatMap$FlatMapMain.onSubscribe(FluxFlatMap.java:371)
    at reactor.core.publisher.FluxRange.subscribe(FluxRange.java:69)
    at reactor.core.publisher.Flux.subscribe(Flux.java:8468)
    at reactor.core.publisher.Flux.subscribeWith(Flux.java:8641)
    at reactor.core.publisher.Flux.subscribe(Flux.java:8438)
    at reactor.core.publisher.Flux.subscribe(Flux.java:8362)
    at reactor.core.publisher.Flux.subscribe(Flux.java:8280)
    at lrn.chapter14.ErrorHandling.ex5(ErrorHandling.java:26)
    at lrn.chapter14.ErrorHandling.main(ErrorHandling.java:12)

Process finished with exit code 1

您无法查询通量本身,但您可以告诉它在停止时做某事。

在包含您的 Kafka 侦听器的服务中,我建议添加一个默认为 false 的 terminated(或类似的)布尔标志。然后,您可以确保通量中的最后一个运算符是:

.doOnTerminate(() -> terminated = true)

...然后获取健康检查端点以监视该值,如果该标志曾经 true.

,则将容器标记为不健康

doOnTerminate() 在这个用例中比 doOnError() 更可靠,因为它会执行发布者是否因错误或完成信号而终止。不过根据评论,这并不完全可靠 - 如果您的发布者由于 JVM 错误或类似错误而终止,则 doOnTerminate() 运算符将不会是 运行.

根据我的经验,如果发生这种情况,通常是 ,因为 OutOfMemoryError,在这种情况下,-XX:+ExitOnOutOfMemoryError 是一个很好的 VM 选项(立即退出然后可以触发立即重启策略,而无需等待健康检查端点被调用并在一段时间后触发重启。)

请记住,还有其他致命的 JVM 错误不会被上述过程捕获,因此仍然不是 100% 可靠。