Dataflow 在 PubsubIO 中记住属性 ID 多长时间
For how long does Dataflow remember attribute id in PubsubIO
PubsubIO 允许根据 id 属性对消息进行重复数据删除:
PubsubIO.readStrings().fromSubscription(pubSubSubscription).withIdAttribute("message_id"))
Dataflow 会记住这个 ID 多长时间?是否在任何地方记录?
已记录,但尚未迁移到文档的 V2+ 版本。这些信息仍然可以在 V1 文档中找到:
https://cloud.google.com/dataflow/model/pubsub-io#using-record-ids
"If you've set a record ID label when using PubsubIO.Read, when Dataflow receives multiple messages with the same ID (which will be read from the attribute with the name of the string you passed to idLabel), Dataflow will discard all but one of the messages. However, Dataflow does not perform this de-duplication for messages with the same record ID value that are published to Cloud Pub/Sub more than 10 minutes apart."
PubsubIO 允许根据 id 属性对消息进行重复数据删除:
PubsubIO.readStrings().fromSubscription(pubSubSubscription).withIdAttribute("message_id"))
Dataflow 会记住这个 ID 多长时间?是否在任何地方记录?
已记录,但尚未迁移到文档的 V2+ 版本。这些信息仍然可以在 V1 文档中找到:
https://cloud.google.com/dataflow/model/pubsub-io#using-record-ids
"If you've set a record ID label when using PubsubIO.Read, when Dataflow receives multiple messages with the same ID (which will be read from the attribute with the name of the string you passed to idLabel), Dataflow will discard all but one of the messages. However, Dataflow does not perform this de-duplication for messages with the same record ID value that are published to Cloud Pub/Sub more than 10 minutes apart."