默认 window 和默认触发器如何在 Apache Beam 中工作

How do default window and default trigger work in apache beam

我正在尝试使用默认触发器实现默认 window 来评估行为,但它没有发出任何结果。

根据 Apache Beam:

The default trigger for a PCollection is based on event time, and emits the results of the window when the Beam’s watermark passes the end of the window, and then fires each time late data arrives.


If you are using both the default windowing configuration and the default trigger, the default trigger emits exactly once, and late data is discarded. This is because the default windowing configuration has an allowed lateness value of 0.

我的代码:

Nb_items = lines | beam.CombineGlobally(beam.combiners.CountCombineFn()).without_defaults() \
                 | 'print' >> beam.ParDo(PrintFn())

如果我设置了触发器,它只会发出数据

Nb_items = lines | 'window' >> beam.WindowInto(window.GlobalWindows(),
            trigger=trigger.AfterProcessingTime(10),
            accumulation_mode=trigger.AccumulationMode.DISCARDING) \
        | 'CountGlobally' >> beam.CombineGlobally(beam.combiners.CountCombineFn()).without_defaults() \
        | 'print' >> beam.ParDo(PrintFn())

如何在不设置触发器的情况下观察默认行为?

是合并转换的问题吗?

If your input PCollection uses the default global windowing, the default behavior is to return a PCollection containing one item. That item’s value comes from the accumulator in the combine function that you specified when applying Combine

查看以下来源:

https://github.com/apache/beam/blob/828b897a2439437d483b1bd7f2a04871f077bde0/examples/java/src/main/java/org/apache/beam/examples/complete/game/LeaderBoard.java#L274

有关 Google Cloud Dataflow

的更多信息

当前的问题是水印永远不会到达 GlobalWindow 的末尾。要使用默认触发器,您可以使用水印可以到达末尾的任何其他 window,例如:'window' >> beam.WindowInto(window.FixedWindows(10))

正如 Guillaume 正确地问到的那样,如果您 运行 使用 Batch,触发器基本上会被忽略。