云 Pub/Sub 订阅者重复消息超过 600 毫秒

Cloud Pub/Sub subscriber repeats messages over 600ms

我们最近将 google pubsub 集成到我们的应用程序中,我们的一些长 运行 任务现在遇到问题,因为它们有时需要超过 1 分钟。我们已将订阅者的确认截止时间配置为 600 秒,但是,任何超过 600ms 的时间都将由 pubsub 重试。

这是我们的配置:

gcloud pubsub subscriptions describe name

ackDeadlineSeconds: 600
expirationPolicy: {}
messageRetentionDuration: 604800s

不确定是什么问题。因此,我们的大部分任务都会重复

Pub/Sub 有一个内置的 At-least-once 传递系统,它将重试未确认的消息。在这种情况下,在 600 秒过去后,您第一次发送的消息变得未被确认,因此 Pub/Sub 重试该消息。它会不断重试 600 秒,直到达到 messageRetentionDuration 或您确认它。

请记住,文档中指定您的订户应该是幂等的。因此,让您的代码能够处理多条消息应该是解决此问题的最佳方法。

您还可以将 messageRetentionDuration 减少到 600s(这是最小值),这样超过 10 分钟标记的任何内容都不会重试。

此外,FAQs 中指出:

Why are there too many duplicate messages?

Cloud Pub/Sub guarantees at-least-once message delivery, which means that occasional duplicates are to be expected. However, a high rate of duplicates may indicate that the client is not acknowledging messages within the configured ack_deadline_seconds, and Cloud Pub/Sub is retrying the message delivery. This can be observed in the monitoring metrics. pubsub.googleapis.com/subscription/pull_ack_message_operation_count for pull subscriptions, and pubsub.googleapis.com/subscription/push_request_count for push subscriptions. Look for elevated expired or webhook_timeout values in the /response_code. This is particularly likely if there are many small messages, since Cloud Pub/Sub may batch messages internally and a partially acknowledged batch will be fully redelivered.

Another possibility is that the subscriber is not acknowledging some messages because the code path processing those specific messages fails, and the Acknowledge call is never made; or the push endpoint never responds or responds with an error.