Apache Beam 窗口:考虑延迟数据但仅发出一个窗格
Apache beam windowing: consider late data but emit only one pane
我想在水印到达 window 结束后 x 分钟时发出一个窗格。这让我确保我处理了一些迟到的数据,但仍然只发出一个窗格。我目前在 java.
工作
目前我找不到解决这个问题的正确方法。当水印到达 window 的末尾时,我可以发出一个窗格,但随后所有迟到的数据都会被丢弃。我可以在 window 末尾发出窗格,然后在收到延迟数据时再次发出窗格,但是在这种情况下,我不会发出单个窗格。
我目前有类似这样的代码:
.triggering(
// This is going to emit the pane, but I don't want emit the pane yet!
AfterWatermark.pastEndOfWindow()
// This is going to emit panes each time I receive late data, however
// I would like to only emit one pane at the end of the allowedLateness
).withAllowedLateness(allowedLateness).accumulatingFiredPanes())
如果仍然存在混淆,我想在水印通过 allowedLateness
时只发出一个窗格。
我要做的是,首先将 Window.ClosingBehavior
to FIRE_ALWAYS
. This way, when the window is permanently closed it will send a final pane (even if there are no late records since the last pane) with PaneInfo.isLast
设置为 true
。
那么,我会继续第二个选项:
I could emit the pane at the end of the window and then again when I
receive late data, however in this case I am not emitting a single
pane.
但是在下游丢弃不是最终的窗格,例如:
public void processElement(ProcessContext c) {
if (c.pane().isLast) {
c.output(c.element());
}
}
谢谢 Guillem,最后我用你的答案找到了这个 very useful link 有很多 apache beam 的例子。由此我想出了以下解决方案:
// We first specify to never emit any panes
.triggering(Never.ever())
// We then specify to fire always when closing the window. This will emit a
// single final pane at the end of allowedLateness
.withAllowedLateness(allowedLateness, Window.ClosingBehavior.FIRE_ALWAYS)
.discardingFiredPanes())
我想在水印到达 window 结束后 x 分钟时发出一个窗格。这让我确保我处理了一些迟到的数据,但仍然只发出一个窗格。我目前在 java.
工作目前我找不到解决这个问题的正确方法。当水印到达 window 的末尾时,我可以发出一个窗格,但随后所有迟到的数据都会被丢弃。我可以在 window 末尾发出窗格,然后在收到延迟数据时再次发出窗格,但是在这种情况下,我不会发出单个窗格。
我目前有类似这样的代码:
.triggering(
// This is going to emit the pane, but I don't want emit the pane yet!
AfterWatermark.pastEndOfWindow()
// This is going to emit panes each time I receive late data, however
// I would like to only emit one pane at the end of the allowedLateness
).withAllowedLateness(allowedLateness).accumulatingFiredPanes())
如果仍然存在混淆,我想在水印通过 allowedLateness
时只发出一个窗格。
我要做的是,首先将 Window.ClosingBehavior
to FIRE_ALWAYS
. This way, when the window is permanently closed it will send a final pane (even if there are no late records since the last pane) with PaneInfo.isLast
设置为 true
。
那么,我会继续第二个选项:
I could emit the pane at the end of the window and then again when I receive late data, however in this case I am not emitting a single pane.
但是在下游丢弃不是最终的窗格,例如:
public void processElement(ProcessContext c) {
if (c.pane().isLast) {
c.output(c.element());
}
}
谢谢 Guillem,最后我用你的答案找到了这个 very useful link 有很多 apache beam 的例子。由此我想出了以下解决方案:
// We first specify to never emit any panes
.triggering(Never.ever())
// We then specify to fire always when closing the window. This will emit a
// single final pane at the end of allowedLateness
.withAllowedLateness(allowedLateness, Window.ClosingBehavior.FIRE_ALWAYS)
.discardingFiredPanes())