默认 window 和默认触发器如何在 Apache Beam 中工作
How do default window and default trigger work in apache beam
我正在尝试使用默认触发器实现默认 window 来评估行为,但它没有发出任何结果。
根据 Apache Beam:
The default trigger for a PCollection is based on event time, and
emits the results of the window when the Beam’s watermark passes the
end of the window, and then fires each time late data arrives.
If you are using both the default windowing configuration and
the default trigger, the default trigger emits exactly once, and late
data is discarded. This is because the default windowing configuration
has an allowed lateness value of 0.
我的代码:
Nb_items = lines | beam.CombineGlobally(beam.combiners.CountCombineFn()).without_defaults() \
| 'print' >> beam.ParDo(PrintFn())
如果我设置了触发器,它只会发出数据
Nb_items = lines | 'window' >> beam.WindowInto(window.GlobalWindows(),
trigger=trigger.AfterProcessingTime(10),
accumulation_mode=trigger.AccumulationMode.DISCARDING) \
| 'CountGlobally' >> beam.CombineGlobally(beam.combiners.CountCombineFn()).without_defaults() \
| 'print' >> beam.ParDo(PrintFn())
如何在不设置触发器的情况下观察默认行为?
是合并转换的问题吗?
If your input PCollection
uses the default global windowing, the default behavior is to return a
PCollection containing one item. That item’s value comes from the
accumulator in the combine function that you specified when applying
Combine
查看以下来源:
有关 Google Cloud Dataflow
的更多信息
当前的问题是水印永远不会到达 GlobalWindow
的末尾。要使用默认触发器,您可以使用水印可以到达末尾的任何其他 window,例如:'window' >> beam.WindowInto(window.FixedWindows(10))
正如 Guillaume 正确地问到的那样,如果您 运行 使用 Batch,触发器基本上会被忽略。
我正在尝试使用默认触发器实现默认 window 来评估行为,但它没有发出任何结果。
根据 Apache Beam:
The default trigger for a PCollection is based on event time, and emits the results of the window when the Beam’s watermark passes the end of the window, and then fires each time late data arrives.
If you are using both the default windowing configuration and the default trigger, the default trigger emits exactly once, and late data is discarded. This is because the default windowing configuration has an allowed lateness value of 0.
我的代码:
Nb_items = lines | beam.CombineGlobally(beam.combiners.CountCombineFn()).without_defaults() \
| 'print' >> beam.ParDo(PrintFn())
如果我设置了触发器,它只会发出数据
Nb_items = lines | 'window' >> beam.WindowInto(window.GlobalWindows(),
trigger=trigger.AfterProcessingTime(10),
accumulation_mode=trigger.AccumulationMode.DISCARDING) \
| 'CountGlobally' >> beam.CombineGlobally(beam.combiners.CountCombineFn()).without_defaults() \
| 'print' >> beam.ParDo(PrintFn())
如何在不设置触发器的情况下观察默认行为?
是合并转换的问题吗?
If your input PCollection uses the default global windowing, the default behavior is to return a PCollection containing one item. That item’s value comes from the accumulator in the combine function that you specified when applying Combine
查看以下来源:
有关 Google Cloud Dataflow
的更多信息当前的问题是水印永远不会到达 GlobalWindow
的末尾。要使用默认触发器,您可以使用水印可以到达末尾的任何其他 window,例如:'window' >> beam.WindowInto(window.FixedWindows(10))
正如 Guillaume 正确地问到的那样,如果您 运行 使用 Batch,触发器基本上会被忽略。