KQL:按时移聚合行并获取最近行的值
KQL: aggregate row by time shift and get value of the more recent row
我觉得看数据更容易。
我们有一个应用程序可以跟踪 Intranet 中的所有用户交互。
eventType
pageUrl
timestamp
timeOnPageMs
pageEvent
https://url1.com/
2021-11-05T06:10:11.591Z
0
pageEvent
https://url1.com/
2021-11-05T06:20:11.591Z
23123
pageEvent
https://url2.com/
2021-11-05T06:11:11.591Z
0
pageEvent
https://url2.com/
2021-11-05T06:30:11.591Z
23123
打开页面事件被识别为有timeOnPageMs=0
,否则为关闭页面事件。
我想定义一个提取所有页面打开事件的查询,但带有页面关闭事件的 timeOnPageMs。
eventType
pageUrl
timestamp
timeOnPageMs
pageEvent
https://url1.com/
2021-11-05T06:10:11.591Z
23123
pageEvent
https://url2.com/
2021-11-05T06:11:11.591Z
33123
我尝试使用 UDF 来查找所需的值,但似乎不可能如上次回复所述here。
在此先感谢任何愿意提供帮助的人!
贾科莫 S.S.
您需要一个 SessionId 来关联事件(以便能够处理同一个 URL 有多个打开事件,然后是同一个 URL 的多个关闭事件的情况) .
然后这就是你解决它的方法:
datatable(eventType:string, pageUrl:string, timestamp:datetime, timeOnPageMs:long, sessionId:string)
[
"pageEvent","https://url1.com/",datetime(2021-11-05T06:10:11.591Z),0,"id1",
"pageEvent","https://url1.com/",datetime(2021-11-05T06:10:15.591Z),0,"id2",
"pageEvent","https://url1.com/",datetime(2021-11-05T06:11:12.591Z),1500,"id2",
"pageEvent","https://url1.com/",datetime(2021-11-05T06:20:11.591Z),23123,"id1",
"pageEvent","https://url2.com/",datetime(2021-11-05T06:11:11.591Z),0,"id3",
"pageEvent","https://url2.com/",datetime(2021-11-05T06:30:11.591Z),23123,"id3"
]
| summarize take_any(eventType, pageUrl), min(timestamp), max(timeOnPageMs) by sessionId
结果:
sessionId
eventType
pageUrl
min_timestamp
max_timeOnPageMs
id1
pageEvent
https://url1.com/
2021-11-05 06:10:11.5910000
23123
id2
pageEvent
https://url1.com/
2021-11-05 06:10:15.5910000
1500
id3
pageEvent
https://url2.com/
2021-11-05 06:11:11.5910000
23123
我觉得看数据更容易。 我们有一个应用程序可以跟踪 Intranet 中的所有用户交互。
eventType | pageUrl | timestamp | timeOnPageMs |
---|---|---|---|
pageEvent | https://url1.com/ | 2021-11-05T06:10:11.591Z | 0 |
pageEvent | https://url1.com/ | 2021-11-05T06:20:11.591Z | 23123 |
pageEvent | https://url2.com/ | 2021-11-05T06:11:11.591Z | 0 |
pageEvent | https://url2.com/ | 2021-11-05T06:30:11.591Z | 23123 |
打开页面事件被识别为有timeOnPageMs=0
,否则为关闭页面事件。
我想定义一个提取所有页面打开事件的查询,但带有页面关闭事件的 timeOnPageMs。
eventType | pageUrl | timestamp | timeOnPageMs |
---|---|---|---|
pageEvent | https://url1.com/ | 2021-11-05T06:10:11.591Z | 23123 |
pageEvent | https://url2.com/ | 2021-11-05T06:11:11.591Z | 33123 |
我尝试使用 UDF 来查找所需的值,但似乎不可能如上次回复所述here。
在此先感谢任何愿意提供帮助的人!
贾科莫 S.S.
您需要一个 SessionId 来关联事件(以便能够处理同一个 URL 有多个打开事件,然后是同一个 URL 的多个关闭事件的情况) .
然后这就是你解决它的方法:
datatable(eventType:string, pageUrl:string, timestamp:datetime, timeOnPageMs:long, sessionId:string)
[
"pageEvent","https://url1.com/",datetime(2021-11-05T06:10:11.591Z),0,"id1",
"pageEvent","https://url1.com/",datetime(2021-11-05T06:10:15.591Z),0,"id2",
"pageEvent","https://url1.com/",datetime(2021-11-05T06:11:12.591Z),1500,"id2",
"pageEvent","https://url1.com/",datetime(2021-11-05T06:20:11.591Z),23123,"id1",
"pageEvent","https://url2.com/",datetime(2021-11-05T06:11:11.591Z),0,"id3",
"pageEvent","https://url2.com/",datetime(2021-11-05T06:30:11.591Z),23123,"id3"
]
| summarize take_any(eventType, pageUrl), min(timestamp), max(timeOnPageMs) by sessionId
结果:
sessionId | eventType | pageUrl | min_timestamp | max_timeOnPageMs |
---|---|---|---|---|
id1 | pageEvent | https://url1.com/ | 2021-11-05 06:10:11.5910000 | 23123 |
id2 | pageEvent | https://url1.com/ | 2021-11-05 06:10:15.5910000 | 1500 |
id3 | pageEvent | https://url2.com/ | 2021-11-05 06:11:11.5910000 | 23123 |