如何使用两列在 KQL 中使用 arg_max() 汇总数据?

How to summarize data with arg_max() in KQL using two columns?

我有一个 table 包含以下列:

RowNum 不是 table 的一部分,但在这里使用是为了能够引用 records/rows。

RowNum ID Value ImportId ImportTime
1 A Doc A content as of May 11, 2022 2022-05-11 2022-05-11 13:00
2 B Doc B content as of May 11, 2022 2022-05-11 2022-05-11 13:00
3 A Doc A content as of May 11, 2022 2022-05-11 2022-05-11 17:00
4 B Doc B content as of May 11, 2022 2022-05-11 2022-05-11 17:00
5 A Doc A content as of May 14, 2022 2022-05-14 2022-05-17 08:00
6 B Doc B content as of May 14, 2022 2022-05-14 2022-05-17 08:00
7 A Doc A content as of May 14, 2022 2022-05-14 2022-05-17 10:00
8 B Doc B content as of May 14, 2022 2022-05-14 2022-05-17 10:00
9 A Doc A content as of May 11, 2022 2022-05-11 2022-05-18 15:00
10 B Doc B content as of May 11, 2022 2022-05-11 2022-05-18 15:00

挑战: 我需要使用最新的 ImportId(即“2022-05-14”) 最新的 ImportTime(即“2022-05-18 15:00").

对于上面的示例,结果应包含 ImportId 为“2022-05-14”和 ImportTime 为“2022-05-17 10:00”的两行(行号 7 和 8)。

我尝试了什么:

方法一

我在 ImportTime 上使用了 arg_max():

T
| summarize arg_max(ImportTime, *) by ID

这 returns 最后两行(9 和 10),其中 ImportId 为“2022-05-11”。这不是我想要的,因为最新的 ImportId 是“2022-05-14”。

方法二

如果我改用 arg_max(ImportId, *) by ID,我得到的是“2022-05-14”(第 5 和 6 行),而不是最新的 ImportTime.

方法 3

我将 ImportTimeImportId 组合成一个扩展列,并在其上应用了 arg_max()似乎 有效,但我不确定它是否在所有情况下都正确?

T
| extend Combined = strcat(ImportId, ImportTime)
| summarize arg_max(Combined, *) by ID

这 returns 在“2022-05-17 10:00”的导入时间“2022-05-14”的预期第 7 行和第 8 行。

有更好的选择吗?

查看 top-nested operator:

datatable(Value:string, ImportId:datetime, ImportTime:datetime)
[
    "A",    datetime(2022-05-11),   datetime(2022-05-11 13:00),
    "B",    datetime(2022-05-11),   datetime(2022-05-11 13:00),
    "A",    datetime(2022-05-11),   datetime(2022-05-11 17:00),
    "B",    datetime(2022-05-11),   datetime(2022-05-11 17:00),
    "A",    datetime(2022-05-14),   datetime(2022-05-17 08:00),
    "B",    datetime(2022-05-14),   datetime(2022-05-17 08:00),
    "A",    datetime(2022-05-14),   datetime(2022-05-17 10:00),
    "B",    datetime(2022-05-14),   datetime(2022-05-17 10:00),
    "A",    datetime(2022-05-11),   datetime(2022-05-18 15:00),
    "B",    datetime(2022-05-11),   datetime(2022-05-18 15:00)
]
| top-nested of Value by ignore=max(1),
  top-nested 1 of ImportId by max(ImportId),
  top-nested 1 of ImportTime by max(ImportTime)
| project Value, ImportId, ImportTime
Value ImportId ImportTime
A 2022-05-14 00:00:00.0000000 2022-05-17 10:00:00.0000000
B 2022-05-14 00:00:00.0000000 2022-05-17 10:00:00.0000000

您也可以使用无限分区运算符尝试这种方法:

datatable(Value:string, ImportId:datetime, ImportTime:datetime)
[
    "A",    datetime(2022-05-11),   datetime(2022-05-11 13:00),
    "B",    datetime(2022-05-11),   datetime(2022-05-11 13:00),
    "A",    datetime(2022-05-11),   datetime(2022-05-11 17:00),
    "B",    datetime(2022-05-11),   datetime(2022-05-11 17:00),
    "A",    datetime(2022-05-14),   datetime(2022-05-17 08:00),
    "B",    datetime(2022-05-14),   datetime(2022-05-17 08:00),
    "A",    datetime(2022-05-14),   datetime(2022-05-17 10:00),
    "B",    datetime(2022-05-14),   datetime(2022-05-17 10:00),
    "A",    datetime(2022-05-11),   datetime(2022-05-18 15:00),
    "B",    datetime(2022-05-11),   datetime(2022-05-18 15:00)
]
| partition hint.strategy = native by Value
(
    partition hint.strategy = native by ImportId
    (
        top 1 by ImportTime
    )
    | top 1 by ImportId
)