根据记录计数总和创建日期范围(KQL、Azure 数据资源管理器、Kusto)
Create Date Ranges based on sum of record count (KQL, Azure Data Explorer, Kusto)
鉴于下面的 table,我想创建一个输出来查找最大日期范围,其中 'RecordCount' 的总和小于或等于 20,000。此外,如果单行大于 20,000,则结果日期范围将是该一天的开始和结束:
datatable(Date:string, RecordCount:long)
[
"2022-04-24T00:00:00.0000000Z", 825,
"2022-04-25T00:00:00.0000000Z", 14925,
"2022-04-26T00:00:00.0000000Z", 18498,
"2022-04-27T00:00:00.0000000Z", 17558,
"2022-04-28T00:00:00.0000000Z", 15626,
"2022-04-29T00:00:00.0000000Z", 12755,
"2022-04-30T00:00:00.0000000Z", 2203,
"2022-05-01T00:00:00.0000000Z", 48594,
"2022-05-02T00:00:00.0000000Z", 4976,
"2022-05-03T00:00:00.0000000Z", 10835,
"2022-05-04T00:00:00.0000000Z", 27505,
"2022-05-05T00:00:00.0000000Z", 22808,
"2022-05-06T00:00:00.0000000Z", 23119,
"2022-05-07T00:00:00.0000000Z", 5141,
"2022-05-08T00:00:00.0000000Z", 2217,
"2022-05-09T00:00:00.0000000Z", 11334,
"2022-05-10T00:00:00.0000000Z", 58,
]
预期结果:
datatable(StartDate:datetime, EndDate:datetime, RecordCount:long)
[
"2022-04-24T00:00:00.0000000Z", "2022-04-25T23:59:59.9999999Z", 15750,
"2022-04-26T00:00:00.0000000Z", "2022-04-26T23:59:59.9999999Z", 18498,
"2022-04-27T00:00:00.0000000Z", "2022-04-27T23:59:59.9999999Z", 17558,
"2022-04-28T00:00:00.0000000Z", "2022-04-28T23:59:59.9999999Z", 15626,
"2022-04-29T00:00:00.0000000Z", "2022-04-30T23:59:59.9999999Z", 14958,
"2022-05-01T00:00:00.0000000Z", "2022-05-01T23:59:59.9999999Z", 48594,
"2022-05-02T00:00:00.0000000Z", "2022-05-03T23:59:59.9999999Z", 15811,
"2022-05-04T00:00:00.0000000Z", "2022-05-04T23:59:59.9999999Z", 27505,
"2022-05-05T00:00:00.0000000Z", "2022-05-05T23:59:59.9999999Z", 22808,
"2022-05-06T00:00:00.0000000Z", "2022-05-06T23:59:59.9999999Z", 23119,
"2022-05-07T00:00:00.0000000Z", "2022-05-10T23:59:59.9999999Z", 18750,
]
基于 scan 运算符
datatable(Date:string, RecordCount:long)
[
"2022-04-24T00:00:00.0000000Z", 825,
"2022-04-25T00:00:00.0000000Z", 14925,
"2022-04-26T00:00:00.0000000Z", 18498,
"2022-04-27T00:00:00.0000000Z", 17558,
"2022-04-28T00:00:00.0000000Z", 15626,
"2022-04-29T00:00:00.0000000Z", 12755,
"2022-04-30T00:00:00.0000000Z", 2203,
"2022-05-01T00:00:00.0000000Z", 48594,
"2022-05-02T00:00:00.0000000Z", 4976,
"2022-05-03T00:00:00.0000000Z", 10835,
"2022-05-04T00:00:00.0000000Z", 27505,
"2022-05-05T00:00:00.0000000Z", 22808,
"2022-05-06T00:00:00.0000000Z", 23119,
"2022-05-07T00:00:00.0000000Z", 5141,
"2022-05-08T00:00:00.0000000Z", 2217,
"2022-05-09T00:00:00.0000000Z", 11334,
"2022-05-10T00:00:00.0000000Z", 58,
]
| order by Date asc
| scan declare (acc_sum:long = 0, group_id:int = 0)
with
(
step s1 : true => acc_sum = RecordCount + iff(s1.acc_sum + RecordCount > 20000, 0, s1.acc_sum)
,group_id = s1.group_id + iff(s1.acc_sum + RecordCount > 20000, 1, 0);
)
| summarize StartDate = min(Date), EndDate = max(Date), RecordCount = sum(RecordCount) by group_id
| project-away group_id
StartDate
EndDate
RecordCount
2022-04-24T00:00:00.0000000Z
2022-04-25T00:00:00.0000000Z
15750
2022-04-26T00:00:00.0000000Z
2022-04-26T00:00:00.0000000Z
18498
2022-04-27T00:00:00.0000000Z
2022-04-27T00:00:00.0000000Z
17558
2022-04-28T00:00:00.0000000Z
2022-04-28T00:00:00.0000000Z
15626
2022-04-29T00:00:00.0000000Z
2022-04-30T00:00:00.0000000Z
14958
2022-05-01T00:00:00.0000000Z
2022-05-01T00:00:00.0000000Z
48594
2022-05-02T00:00:00.0000000Z
2022-05-03T00:00:00.0000000Z
15811
2022-05-04T00:00:00.0000000Z
2022-05-04T00:00:00.0000000Z
27505
2022-05-05T00:00:00.0000000Z
2022-05-05T00:00:00.0000000Z
22808
2022-05-06T00:00:00.0000000Z
2022-05-06T00:00:00.0000000Z
23119
2022-05-07T00:00:00.0000000Z
2022-05-10T00:00:00.0000000Z
18750
鉴于下面的 table,我想创建一个输出来查找最大日期范围,其中 'RecordCount' 的总和小于或等于 20,000。此外,如果单行大于 20,000,则结果日期范围将是该一天的开始和结束:
datatable(Date:string, RecordCount:long)
[
"2022-04-24T00:00:00.0000000Z", 825,
"2022-04-25T00:00:00.0000000Z", 14925,
"2022-04-26T00:00:00.0000000Z", 18498,
"2022-04-27T00:00:00.0000000Z", 17558,
"2022-04-28T00:00:00.0000000Z", 15626,
"2022-04-29T00:00:00.0000000Z", 12755,
"2022-04-30T00:00:00.0000000Z", 2203,
"2022-05-01T00:00:00.0000000Z", 48594,
"2022-05-02T00:00:00.0000000Z", 4976,
"2022-05-03T00:00:00.0000000Z", 10835,
"2022-05-04T00:00:00.0000000Z", 27505,
"2022-05-05T00:00:00.0000000Z", 22808,
"2022-05-06T00:00:00.0000000Z", 23119,
"2022-05-07T00:00:00.0000000Z", 5141,
"2022-05-08T00:00:00.0000000Z", 2217,
"2022-05-09T00:00:00.0000000Z", 11334,
"2022-05-10T00:00:00.0000000Z", 58,
]
预期结果:
datatable(StartDate:datetime, EndDate:datetime, RecordCount:long)
[
"2022-04-24T00:00:00.0000000Z", "2022-04-25T23:59:59.9999999Z", 15750,
"2022-04-26T00:00:00.0000000Z", "2022-04-26T23:59:59.9999999Z", 18498,
"2022-04-27T00:00:00.0000000Z", "2022-04-27T23:59:59.9999999Z", 17558,
"2022-04-28T00:00:00.0000000Z", "2022-04-28T23:59:59.9999999Z", 15626,
"2022-04-29T00:00:00.0000000Z", "2022-04-30T23:59:59.9999999Z", 14958,
"2022-05-01T00:00:00.0000000Z", "2022-05-01T23:59:59.9999999Z", 48594,
"2022-05-02T00:00:00.0000000Z", "2022-05-03T23:59:59.9999999Z", 15811,
"2022-05-04T00:00:00.0000000Z", "2022-05-04T23:59:59.9999999Z", 27505,
"2022-05-05T00:00:00.0000000Z", "2022-05-05T23:59:59.9999999Z", 22808,
"2022-05-06T00:00:00.0000000Z", "2022-05-06T23:59:59.9999999Z", 23119,
"2022-05-07T00:00:00.0000000Z", "2022-05-10T23:59:59.9999999Z", 18750,
]
基于 scan 运算符
datatable(Date:string, RecordCount:long)
[
"2022-04-24T00:00:00.0000000Z", 825,
"2022-04-25T00:00:00.0000000Z", 14925,
"2022-04-26T00:00:00.0000000Z", 18498,
"2022-04-27T00:00:00.0000000Z", 17558,
"2022-04-28T00:00:00.0000000Z", 15626,
"2022-04-29T00:00:00.0000000Z", 12755,
"2022-04-30T00:00:00.0000000Z", 2203,
"2022-05-01T00:00:00.0000000Z", 48594,
"2022-05-02T00:00:00.0000000Z", 4976,
"2022-05-03T00:00:00.0000000Z", 10835,
"2022-05-04T00:00:00.0000000Z", 27505,
"2022-05-05T00:00:00.0000000Z", 22808,
"2022-05-06T00:00:00.0000000Z", 23119,
"2022-05-07T00:00:00.0000000Z", 5141,
"2022-05-08T00:00:00.0000000Z", 2217,
"2022-05-09T00:00:00.0000000Z", 11334,
"2022-05-10T00:00:00.0000000Z", 58,
]
| order by Date asc
| scan declare (acc_sum:long = 0, group_id:int = 0)
with
(
step s1 : true => acc_sum = RecordCount + iff(s1.acc_sum + RecordCount > 20000, 0, s1.acc_sum)
,group_id = s1.group_id + iff(s1.acc_sum + RecordCount > 20000, 1, 0);
)
| summarize StartDate = min(Date), EndDate = max(Date), RecordCount = sum(RecordCount) by group_id
| project-away group_id
StartDate | EndDate | RecordCount |
---|---|---|
2022-04-24T00:00:00.0000000Z | 2022-04-25T00:00:00.0000000Z | 15750 |
2022-04-26T00:00:00.0000000Z | 2022-04-26T00:00:00.0000000Z | 18498 |
2022-04-27T00:00:00.0000000Z | 2022-04-27T00:00:00.0000000Z | 17558 |
2022-04-28T00:00:00.0000000Z | 2022-04-28T00:00:00.0000000Z | 15626 |
2022-04-29T00:00:00.0000000Z | 2022-04-30T00:00:00.0000000Z | 14958 |
2022-05-01T00:00:00.0000000Z | 2022-05-01T00:00:00.0000000Z | 48594 |
2022-05-02T00:00:00.0000000Z | 2022-05-03T00:00:00.0000000Z | 15811 |
2022-05-04T00:00:00.0000000Z | 2022-05-04T00:00:00.0000000Z | 27505 |
2022-05-05T00:00:00.0000000Z | 2022-05-05T00:00:00.0000000Z | 22808 |
2022-05-06T00:00:00.0000000Z | 2022-05-06T00:00:00.0000000Z | 23119 |
2022-05-07T00:00:00.0000000Z | 2022-05-10T00:00:00.0000000Z | 18750 |