CollectTop 返回的行比我在 Azure 流分析中预期的要多
CollectTop is returning more rows than I would expect in Azure Stream Analytics
我上传了以下输入(在 Azure 门户中测试):
[
{"engineid":"engine001","eventtime":1,"tmp":19.3,"hum":0.22},
{"engineid":"engine001","eventtime":2,"tmp":19.7,"hum":0.21},
{"engineid":"engine002","eventtime":3,"tmp":20.4,"hum":0.25},
{"engineid":"engine001","eventtime":4,"tmp":19.6,"hum":0.24}
]
然后我尝试获取记录组,这样我就有了每个引擎的最后两行。正如您在示例中看到的,我只有 2 个不同的引擎,所以我希望输出包含两条记录,每条记录都包含排名记录,但我得到 4 条输出记录。
这是我的查询:
-- Taking relevant fields from the input stream
WITH RelevantTelemetry AS
(
SELECT engineid, tmp, hum, eventtime
FROM [engine-telemetry]
WHERE engineid IS NOT NULL
),
-- Grouping by engineid in TimeWindows
TimeWindows AS
(
SELECT engineid,
CollectTop(2) OVER (ORDER BY eventtime DESC) as TimeWindow
FROM
[RelevantTelemetry]
WHERE engineid IS NOT NULL
GROUP BY SlidingWindow(hour, 24), engineid
)
--Output timewindows for verification purposes
SELECT TimeWindow
INTO debug
FROM TimeWindows
我使用了 TIMESTAMP BY 属性,更改了 GROUP BY 的顺序等,但我仍然有以下 4 条记录,而不是我期望的 2 条:
有什么想法吗?
[
{"TimeWindow":
[
{"rank":1,"value": "engineid":"engine001","tmp":0.0003,"hum":-0.0002,"eventtime":1}}
]},
{"TimeWindow":
[
{"rank":1,"value":{"engineid":"engine001","tmp":-0.0019,"hum":-0.0002,"eventtime":4}},
{"rank":2,"value":{"engineid":"engine001","tmp":-0.0026,"hum":-0.0002,"eventtime":2}},
{"rank":3,"value":{"engineid":"engine001","tmp":0.0003,"hum":-0.0002,"eventtime":1}}
]},
{"TimeWindow":
[
{"rank":1,"value":{"engineid":"engine002","tmp":0.0017,"hum":0.0003,"eventtime":3}}
]},
{"TimeWindow":
[
{"rank":1,"value":{"engineid":"engine001","tmp":-0.0019,"hum":-0.0002,"eventtime":4}},
{"rank":2,"value":{"engineid":"engine001","tmp":-0.0026,"hum":-0.0002,"eventtime":2}}
]}
]
根据@SteveZhao 的建议,您需要使用 GROUP BY TumblingWindow(hour, 24), engineid
而不是 GROUP BY SlidingWindow(hour, 24), engineid
滑动window可以根据时间间隔重叠条目
更多信息请参考:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-window-functions
我上传了以下输入(在 Azure 门户中测试):
[
{"engineid":"engine001","eventtime":1,"tmp":19.3,"hum":0.22},
{"engineid":"engine001","eventtime":2,"tmp":19.7,"hum":0.21},
{"engineid":"engine002","eventtime":3,"tmp":20.4,"hum":0.25},
{"engineid":"engine001","eventtime":4,"tmp":19.6,"hum":0.24}
]
然后我尝试获取记录组,这样我就有了每个引擎的最后两行。正如您在示例中看到的,我只有 2 个不同的引擎,所以我希望输出包含两条记录,每条记录都包含排名记录,但我得到 4 条输出记录。
这是我的查询:
-- Taking relevant fields from the input stream
WITH RelevantTelemetry AS
(
SELECT engineid, tmp, hum, eventtime
FROM [engine-telemetry]
WHERE engineid IS NOT NULL
),
-- Grouping by engineid in TimeWindows
TimeWindows AS
(
SELECT engineid,
CollectTop(2) OVER (ORDER BY eventtime DESC) as TimeWindow
FROM
[RelevantTelemetry]
WHERE engineid IS NOT NULL
GROUP BY SlidingWindow(hour, 24), engineid
)
--Output timewindows for verification purposes
SELECT TimeWindow
INTO debug
FROM TimeWindows
我使用了 TIMESTAMP BY 属性,更改了 GROUP BY 的顺序等,但我仍然有以下 4 条记录,而不是我期望的 2 条:
有什么想法吗?
[
{"TimeWindow":
[
{"rank":1,"value": "engineid":"engine001","tmp":0.0003,"hum":-0.0002,"eventtime":1}}
]},
{"TimeWindow":
[
{"rank":1,"value":{"engineid":"engine001","tmp":-0.0019,"hum":-0.0002,"eventtime":4}},
{"rank":2,"value":{"engineid":"engine001","tmp":-0.0026,"hum":-0.0002,"eventtime":2}},
{"rank":3,"value":{"engineid":"engine001","tmp":0.0003,"hum":-0.0002,"eventtime":1}}
]},
{"TimeWindow":
[
{"rank":1,"value":{"engineid":"engine002","tmp":0.0017,"hum":0.0003,"eventtime":3}}
]},
{"TimeWindow":
[
{"rank":1,"value":{"engineid":"engine001","tmp":-0.0019,"hum":-0.0002,"eventtime":4}},
{"rank":2,"value":{"engineid":"engine001","tmp":-0.0026,"hum":-0.0002,"eventtime":2}}
]}
]
根据@SteveZhao 的建议,您需要使用 GROUP BY TumblingWindow(hour, 24), engineid
而不是 GROUP BY SlidingWindow(hour, 24), engineid
滑动window可以根据时间间隔重叠条目
更多信息请参考: https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-window-functions