将查询转换为 KQL

Question

我有一个包含对象的应用程序。它有 2 个 guid 值，guid1 和 guid2。在任何给定时间，只有 1 个被填充。首先是 guid1，稍后是 guid2。状态变化记录在下面的 table 中。我想对记录进行分组，以便将一个对象的所有 guid 分组在一起。

所以这个 table ...

timestamp                      guid1                                  guid2                             text
2022-05-06T10:00:31.5767324Z | cb73c58e-e36b-4fe3-8663-33027ba2afc7 | null                            | abc1  
2022-05-06T10:00:31.5767324Z | ec5d1b9395444a06a36130a9d62ae2c5     | null                            | abc2
2022-05-06T10:01:31.5767324Z | cb73c58e-e36b-4fe3-8663-33027ba2afc7 | b7ef78cde158437fb65a6878ca908751| def1
2022-05-06T10:01:31.5767324Z | ec5d1b9395444a06a36130a9d62ae2c5     | 206eb977459c4f91bafb9b798f5d60c4| def2
2022-05-06T10:02:31.5767324Z | null                                 | b7ef78cde158437fb65a6878ca908751| ghi1
2022-05-06T10:02:31.5767324Z | null                                 | 206eb977459c4f91bafb9b798f5d60c4| ghi2

...成为这组查询结果

timestamp                      guid1                                  guid2                             text
2022-05-06T10:00:31.5767324Z | cb73c58e-e36b-4fe3-8663-33027ba2afc7 | null                            | abc1  
2022-05-06T10:01:31.5767324Z | cb73c58e-e36b-4fe3-8663-33027ba2afc7 | b7ef78cde158437fb65a6878ca908751| def1
2022-05-06T10:02:31.5767324Z | null                                 | b7ef78cde158437fb65a6878ca908751| ghi1    
2022-05-06T10:00:31.5767324Z | ec5d1b9395444a06a36130a9d62ae2c5     | null                            | abc2
2022-05-06T10:01:31.5767324Z | ec5d1b9395444a06a36130a9d62ae2c5     | 206eb977459c4f91bafb9b798f5d60c4| def2
2022-05-06T10:02:31.5767324Z | null                                 | 206eb977459c4f91bafb9b798f5d60c4| ghi2

从上面的期望输出来看，仅以前 3 条记录为例，它们现在被分组以显示完整的状态更改历史记录。第一条记录显示 guid1 的值为 cb73c58e-e36b-4fe3-8663-33027ba2afc7，然后是显示 guid2 激活的记录，最后 guid1 为空且仅存在 guid2。在 guid1 ec5d1b9395444a06a36130a9d62ae2c5

的最后 3 条记录中可以看到相同的正确分组

我在 sql 中尝试此操作时遇到了困难，没关系 kql，我在一个单独的 sql 线程上问了这个问题，建议的解决方案低于我在转换为 kql 时遇到的困难。

select * 
from t
order by Row_Number() over(partition by [timestamp] order by [timestamp]),
guid1 desc, guid2;

支持

row_number 和 partition by 但我没有看到对 over 的引用所以我不确定如何实现这一目标。

也许有更 kql 友好的方法来实现这个？

[注释解决方案]

let t = datatable(timestamp:datetime,guid1:string,guid2:string,text:string)
[
     '2022-05-06T10:00:31.5767324Z' ,'cb73c58e-e36b-4fe3-8663-33027ba2afc7' ,''                                 ,'abc1'  
    ,'2022-05-06T10:00:31.5767324Z' ,'ec5d1b9395444a06a36130a9d62ae2c5'     ,''                                 ,'abc2'
    ,'2022-05-06T10:01:31.5767324Z' ,'cb73c58e-e36b-4fe3-8663-33027ba2afc7' ,'b7ef78cde158437fb65a6878ca908751' ,'def1'
    ,'2022-05-06T10:01:31.5767324Z' ,'ec5d1b9395444a06a36130a9d62ae2c5'     ,'206eb977459c4f91bafb9b798f5d60c4' ,'def2'
    ,'2022-05-06T10:02:31.5767324Z' ,''                                     ,'b7ef78cde158437fb65a6878ca908751' ,'ghi1'
    ,'2022-05-06T10:02:31.5767324Z' ,''                                     ,'206eb977459c4f91bafb9b798f5d60c4' ,'ghi2'
    ,'2022-05-06T10:03:31.5767324Z' ,'fee3d3522a3942a69802774f8a5128ff'     ,''                                 ,'xxx1'
    ,'2022-05-06T10:04:31.5767324Z' ,'48b04d074cd141dba6eb9a354d26be0a'     ,''                                 ,'yyy1'
    ,'2022-05-06T10:04:31.5767324Z' ,'48b04d074cd141dba6eb9a354d26be0a'     ,'0d2ac92589634b27a171be005375b1b5' ,'yyy2'
];
t
| where isnotempty(guid1)
// select records where guid1 is not empty

| summarize take_any(guid2) by guid1
// of that recordset, reduce to 2 columns of interest and select records where the accompanying guid2 is not empty. That's default take_any behaviour

| serialize 
// serialize the recordset to enable the use of window functions later in query

| extend gid = row_number() 
// mark this recordset with a parent guid row identifier

| mv-expand g = pack_array(guid1, guid2) to typeof(string)
// push recordset into an array

| where isnotempty(g)
// continue in execution if the array is populated

| project g, gid 
// reduce recordset to array of related guid1/guid2 and associated parent guid row identifier

| join kind=inner (t | extend g = coalesce(guid1, guid2)) on g
// inner join on original recordset 

| project-away g, g1
// exclude g, g1 columns from recordset

| partition hint.strategy=native by gid
  (
      order by gid asc, iff(isnotempty(guid1), 1, 2) asc, iff(isempty(guid2), 1, 2) asc
    | extend rid = row_number()
  )
// - partition the recordset by gid (to group related parent guid records),
// - order by gid with an order preference of non empty guid1Ids/empty guid2Ids over empty guid1Ids/non empty guid2Ids
// - mark each record with a row id

| order by gid asc, rid asc 
// order recordset

| project-reorder gid, rid
// reorder gid column to appear before rid

Answer 1

假设是如果我们有一个包含 guid2 和一个空 guid1 的记录，我们也有一个包含该 guid2 和 non-empty guid1 的记录。

第 1 部分

创建guid字典
我们获取所有具有 guid1 的记录。
从这些记录中，我们只为每个 guid1 保留 1 条记录，最好使用 guid2。
我们任意编号这些记录。这些数字稍后将代表记录组 (gid = group id)。
我们现在复制所有同时具有 guid1 和 guid2 的记录，对于一个记录，我们保留 guid1，而对于另一个记录，我们保留 guid2。请注意，这些记录将具有相同的记录编号。

第 2 部分

将每个原始 table 记录与字典中的 guid 匹配
如果一条记录有 guid1（也许还有 guid2），我们将它连接到字典中的 guid1。
如果记录没有 guid1，我们将其连接到 guid2。
此时，每条记录作为一个gid。其 guid1 和 guid2 连接的记录具有相同的 gid。

第 3 部分

可选。对每个组中的记录进行编号，首先是带有 guid1 的记录，然后是同时带有 guid 的记录，最后是仅带有 guid2

的记录

let t = datatable(timestamp:datetime,guid1:string,guid2:string,text:string)
[
     '2022-05-06T10:00:31.5767324Z' ,'cb73c58e-e36b-4fe3-8663-33027ba2afc7' ,''                                 ,'abc1'  
    ,'2022-05-06T10:00:31.5767324Z' ,'ec5d1b9395444a06a36130a9d62ae2c5'     ,''                                 ,'abc2'
    ,'2022-05-06T10:01:31.5767324Z' ,'cb73c58e-e36b-4fe3-8663-33027ba2afc7' ,'b7ef78cde158437fb65a6878ca908751' ,'def1'
    ,'2022-05-06T10:01:31.5767324Z' ,'ec5d1b9395444a06a36130a9d62ae2c5'     ,'206eb977459c4f91bafb9b798f5d60c4' ,'def2'
    ,'2022-05-06T10:02:31.5767324Z' ,''                                     ,'b7ef78cde158437fb65a6878ca908751' ,'ghi1'
    ,'2022-05-06T10:02:31.5767324Z' ,''                                     ,'206eb977459c4f91bafb9b798f5d60c4' ,'ghi2'
    ,'2022-05-06T10:03:31.5767324Z' ,'fee3d3522a3942a69802774f8a5128ff'     ,''                                 ,'xxx1'
    ,'2022-05-06T10:04:31.5767324Z' ,'48b04d074cd141dba6eb9a354d26be0a'     ,''                                 ,'yyy1'
    ,'2022-05-06T10:04:31.5767324Z' ,'48b04d074cd141dba6eb9a354d26be0a'     ,'0d2ac92589634b27a171be005375b1b5' ,'yyy2'
];
t
| where isnotempty(guid1)
| summarize take_any(guid2) by guid1
| serialize 
| extend gid = row_number() 
| mv-expand g = pack_array(guid1, guid2) to typeof(string)
| where isnotempty(g)
| project g, gid 
| join kind=inner (t | extend g = coalesce(guid1, guid2)) on g
| project-away g, g1
| partition hint.strategy=native by gid
  (
      order by gid asc, iff(isnotempty(guid1), 1, 2) asc, iff(isempty(guid2), 1, 2) asc
    | extend rid = row_number()
  )
| order by gid asc, rid asc 
| project-reorder gid, rid

gid	rid	timestamp	guid1	guid2	text
1	1	2022-05-06T10:00:31.5767324Z	cb73c58e-e36b-4fe3-8663-33027ba2afc7		abc1
1	2	2022-05-06T10:01:31.5767324Z	cb73c58e-e36b-4fe3-8663-33027ba2afc7	b7ef78cde158437fb65a6878ca908751	def1
1	3	2022-05-06T10:02:31.5767324Z		b7ef78cde158437fb65a6878ca908751	ghi1
2	1	2022-05-06T10:00:31.5767324Z	ec5d1b9395444a06a36130a9d62ae2c5		abc2
2	2	2022-05-06T10:01:31.5767324Z	ec5d1b9395444a06a36130a9d62ae2c5	206eb977459c4f91bafb9b798f5d60c4	def2
2	3	2022-05-06T10:02:31.5767324Z		206eb977459c4f91bafb9b798f5d60c4	ghi2
3	1	2022-05-06T10:03:31.5767324Z	fee3d3522a3942a69802774f8a5128ff		xxx1
4	1	2022-05-06T10:04:31.5767324Z	48b04d074cd141dba6eb9a354d26be0a		yyy1
4	2	2022-05-06T10:04:31.5767324Z	48b04d074cd141dba6eb9a354d26be0a	0d2ac92589634b27a171be005375b1b5	yyy2

Fiddle

将查询转换为 KQL

Convert query to KQL

kql

azure-data-explorer

第 1 部分

第 2 部分

第 3 部分