根据记录计数总和创建日期范围(KQL、Azure 数据资源管理器、Kusto)

Create Date Ranges based on sum of record count (KQL, Azure Data Explorer, Kusto)

鉴于下面的 table,我想创建一个输出来查找最大日期范围,其中 'RecordCount' 的总和小于或等于 20,000。此外,如果单行大于 20,000,则结果日期范围将是该一天的开始和结束:

datatable(Date:string, RecordCount:long)
[
    "2022-04-24T00:00:00.0000000Z", 825, 
    "2022-04-25T00:00:00.0000000Z", 14925, 
    "2022-04-26T00:00:00.0000000Z", 18498, 
    "2022-04-27T00:00:00.0000000Z", 17558, 
    "2022-04-28T00:00:00.0000000Z", 15626, 
    "2022-04-29T00:00:00.0000000Z", 12755, 
    "2022-04-30T00:00:00.0000000Z", 2203, 
    "2022-05-01T00:00:00.0000000Z", 48594, 
    "2022-05-02T00:00:00.0000000Z", 4976, 
    "2022-05-03T00:00:00.0000000Z", 10835, 
    "2022-05-04T00:00:00.0000000Z", 27505, 
    "2022-05-05T00:00:00.0000000Z", 22808, 
    "2022-05-06T00:00:00.0000000Z", 23119, 
    "2022-05-07T00:00:00.0000000Z", 5141, 
    "2022-05-08T00:00:00.0000000Z", 2217, 
    "2022-05-09T00:00:00.0000000Z", 11334, 
    "2022-05-10T00:00:00.0000000Z", 58, 
]

预期结果:

datatable(StartDate:datetime, EndDate:datetime, RecordCount:long)
[
    "2022-04-24T00:00:00.0000000Z", "2022-04-25T23:59:59.9999999Z", 15750, 
    "2022-04-26T00:00:00.0000000Z", "2022-04-26T23:59:59.9999999Z", 18498, 
    "2022-04-27T00:00:00.0000000Z", "2022-04-27T23:59:59.9999999Z", 17558, 
    "2022-04-28T00:00:00.0000000Z", "2022-04-28T23:59:59.9999999Z", 15626, 
    "2022-04-29T00:00:00.0000000Z", "2022-04-30T23:59:59.9999999Z", 14958, 
    "2022-05-01T00:00:00.0000000Z", "2022-05-01T23:59:59.9999999Z", 48594, 
    "2022-05-02T00:00:00.0000000Z", "2022-05-03T23:59:59.9999999Z", 15811, 
    "2022-05-04T00:00:00.0000000Z", "2022-05-04T23:59:59.9999999Z", 27505, 
    "2022-05-05T00:00:00.0000000Z", "2022-05-05T23:59:59.9999999Z", 22808, 
    "2022-05-06T00:00:00.0000000Z", "2022-05-06T23:59:59.9999999Z", 23119, 
    "2022-05-07T00:00:00.0000000Z", "2022-05-10T23:59:59.9999999Z", 18750, 
]

基于 scan 运算符

datatable(Date:string, RecordCount:long)
[
    "2022-04-24T00:00:00.0000000Z", 825, 
    "2022-04-25T00:00:00.0000000Z", 14925, 
    "2022-04-26T00:00:00.0000000Z", 18498, 
    "2022-04-27T00:00:00.0000000Z", 17558, 
    "2022-04-28T00:00:00.0000000Z", 15626, 
    "2022-04-29T00:00:00.0000000Z", 12755, 
    "2022-04-30T00:00:00.0000000Z", 2203, 
    "2022-05-01T00:00:00.0000000Z", 48594, 
    "2022-05-02T00:00:00.0000000Z", 4976, 
    "2022-05-03T00:00:00.0000000Z", 10835, 
    "2022-05-04T00:00:00.0000000Z", 27505, 
    "2022-05-05T00:00:00.0000000Z", 22808, 
    "2022-05-06T00:00:00.0000000Z", 23119, 
    "2022-05-07T00:00:00.0000000Z", 5141, 
    "2022-05-08T00:00:00.0000000Z", 2217, 
    "2022-05-09T00:00:00.0000000Z", 11334, 
    "2022-05-10T00:00:00.0000000Z", 58, 
]
| order by Date asc
| scan declare (acc_sum:long = 0, group_id:int = 0)
with
(
    step s1 : true => acc_sum  = RecordCount + iff(s1.acc_sum + RecordCount > 20000, 0, s1.acc_sum)
                     ,group_id = s1.group_id + iff(s1.acc_sum + RecordCount > 20000, 1, 0);
) 
| summarize StartDate = min(Date), EndDate = max(Date), RecordCount = sum(RecordCount) by group_id
| project-away group_id
StartDate EndDate RecordCount
2022-04-24T00:00:00.0000000Z 2022-04-25T00:00:00.0000000Z 15750
2022-04-26T00:00:00.0000000Z 2022-04-26T00:00:00.0000000Z 18498
2022-04-27T00:00:00.0000000Z 2022-04-27T00:00:00.0000000Z 17558
2022-04-28T00:00:00.0000000Z 2022-04-28T00:00:00.0000000Z 15626
2022-04-29T00:00:00.0000000Z 2022-04-30T00:00:00.0000000Z 14958
2022-05-01T00:00:00.0000000Z 2022-05-01T00:00:00.0000000Z 48594
2022-05-02T00:00:00.0000000Z 2022-05-03T00:00:00.0000000Z 15811
2022-05-04T00:00:00.0000000Z 2022-05-04T00:00:00.0000000Z 27505
2022-05-05T00:00:00.0000000Z 2022-05-05T00:00:00.0000000Z 22808
2022-05-06T00:00:00.0000000Z 2022-05-06T00:00:00.0000000Z 23119
2022-05-07T00:00:00.0000000Z 2022-05-10T00:00:00.0000000Z 18750

Fiddle