SQL OLAP 与 KQL || row_number() 在 SQL 与 KQL 中的分区
SQL OLAP vs KQL || row_number() over partition by in SQL vs KQL
我在 SQL 中有 OLAP 查询,要求是使用 KQL 在 ADX(Azure 数据资源管理器)中获取相同的查询结果。
我是 KQL 的新手,正在尝试将相同的 SQL 转换为 KQL,但无法在 KQL 中逐个分区实现 row_number()。
尝试了 MS 的以下文章,但使用 EXPLAIN 关键字没有成功。
https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/sqlcheatsheet
下面是SQL,在KQL中需要相同。你能帮我解决这个问题吗?
Select logdate, adlspathL, count as totalRequests ,repeated as repeatedInLast7Days
from (select logdate, adlspathL, count ,SUM(rnk1) over(partition by adlspathL) repeated, rnk2
from(Select logdate, adlspathL, count, rnk,
rank() over(partition by adlspathL order by adlspathL) rnk1,
row_number() over(partition by adlspathL order by count desc) rnk2
from (Select logdate, adlspathL, count,
rank() over(partition by logdate order by count desc) rnk from ( Select logdate, adlspath as adlspathL, SUM(cnt) count
from `LogAnalyticsADLS`.`LogAnalyticsProdADLSAgg_OpsName`
where logdate between '2021-08-23' and '2021-08-29'
group by logdate, adlspathL
)a
)b
where rnk between 1 and 10
order by logdate , rnk
)a
)b
where rnk2 =1
谢谢,
梵天
这是一个 KQL 解决方案。
目前没有用于排名的 built-in 函数,仅用于密集排名(row_rank), however a rank functionallity could be achieved by the scan 运算符。
使用 arg_max 函数可以轻松实现从同一行检索最大值 + 附加值。
// Generation of a data sample. Not a prat of the solution
let loganalyticsprodadlsagg_opsname = range i from 1 to 1000 step 1 | project logdate = bin(datetime(2021-09-01) - 10d*rand(), 1d), adlspath = strcat("adlspath_", tostring(toint(rand(15)))), cnt = toint(rand(100));
// Solution starts here.
loganalyticsprodadlsagg_opsname
| where logdate between (datetime('2021-08-23') .. datetime('2021-08-29'))
| summarize cnt = sum(cnt) by logdate, adlspath
| partition by logdate
(
order by cnt desc
| scan declare(rank:int = 1, rn:int = 0)
with
(
step s : True => rn = s.rn + 1, rank = iff(cnt == s.cnt, s.rank, s.rn+1);
)
| where rank <= 10
)
| summarize totalrequests = arg_max(cnt, logdate), repeatedInLast7Days = count() by adlspath
adlspath
totalrequests
logdate
repeatedInLast7Days
adlspath_7
644
2021-08-23T00:00:00Z
4
adlspath_8
472
2021-08-23T00:00:00Z
6
adlspath_1
760
2021-08-24T00:00:00Z
6
adlspath_12
400
2021-08-24T00:00:00Z
5
adlspath_5
599
2021-08-24T00:00:00Z
3
adlspath_11
410
2021-08-25T00:00:00Z
4
adlspath_14
469
2021-08-25T00:00:00Z
5
adlspath_13
565
2021-08-25T00:00:00Z
6
adlspath_9
472
2021-08-25T00:00:00Z
2
adlspath_2
514
2021-08-26T00:00:00Z
6
adlspath_10
391
2021-08-26T00:00:00Z
4
adlspath_3
414
2021-08-27T00:00:00Z
4
adlspath_6
594
2021-08-28T00:00:00Z
3
adlspath_0
499
2021-08-28T00:00:00Z
6
adlspath_4
552
2021-08-29T00:00:00Z
6
我在 SQL 中有 OLAP 查询,要求是使用 KQL 在 ADX(Azure 数据资源管理器)中获取相同的查询结果。
我是 KQL 的新手,正在尝试将相同的 SQL 转换为 KQL,但无法在 KQL 中逐个分区实现 row_number()。
尝试了 MS 的以下文章,但使用 EXPLAIN 关键字没有成功。
https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/sqlcheatsheet
下面是SQL,在KQL中需要相同。你能帮我解决这个问题吗?
Select logdate, adlspathL, count as totalRequests ,repeated as repeatedInLast7Days
from (select logdate, adlspathL, count ,SUM(rnk1) over(partition by adlspathL) repeated, rnk2
from(Select logdate, adlspathL, count, rnk,
rank() over(partition by adlspathL order by adlspathL) rnk1,
row_number() over(partition by adlspathL order by count desc) rnk2
from (Select logdate, adlspathL, count,
rank() over(partition by logdate order by count desc) rnk from ( Select logdate, adlspath as adlspathL, SUM(cnt) count
from `LogAnalyticsADLS`.`LogAnalyticsProdADLSAgg_OpsName`
where logdate between '2021-08-23' and '2021-08-29'
group by logdate, adlspathL
)a
)b
where rnk between 1 and 10
order by logdate , rnk
)a
)b
where rnk2 =1
谢谢, 梵天
这是一个 KQL 解决方案。
目前没有用于排名的 built-in 函数,仅用于密集排名(row_rank), however a rank functionallity could be achieved by the scan 运算符。
使用 arg_max 函数可以轻松实现从同一行检索最大值 + 附加值。
// Generation of a data sample. Not a prat of the solution
let loganalyticsprodadlsagg_opsname = range i from 1 to 1000 step 1 | project logdate = bin(datetime(2021-09-01) - 10d*rand(), 1d), adlspath = strcat("adlspath_", tostring(toint(rand(15)))), cnt = toint(rand(100));
// Solution starts here.
loganalyticsprodadlsagg_opsname
| where logdate between (datetime('2021-08-23') .. datetime('2021-08-29'))
| summarize cnt = sum(cnt) by logdate, adlspath
| partition by logdate
(
order by cnt desc
| scan declare(rank:int = 1, rn:int = 0)
with
(
step s : True => rn = s.rn + 1, rank = iff(cnt == s.cnt, s.rank, s.rn+1);
)
| where rank <= 10
)
| summarize totalrequests = arg_max(cnt, logdate), repeatedInLast7Days = count() by adlspath
adlspath | totalrequests | logdate | repeatedInLast7Days |
---|---|---|---|
adlspath_7 | 644 | 2021-08-23T00:00:00Z | 4 |
adlspath_8 | 472 | 2021-08-23T00:00:00Z | 6 |
adlspath_1 | 760 | 2021-08-24T00:00:00Z | 6 |
adlspath_12 | 400 | 2021-08-24T00:00:00Z | 5 |
adlspath_5 | 599 | 2021-08-24T00:00:00Z | 3 |
adlspath_11 | 410 | 2021-08-25T00:00:00Z | 4 |
adlspath_14 | 469 | 2021-08-25T00:00:00Z | 5 |
adlspath_13 | 565 | 2021-08-25T00:00:00Z | 6 |
adlspath_9 | 472 | 2021-08-25T00:00:00Z | 2 |
adlspath_2 | 514 | 2021-08-26T00:00:00Z | 6 |
adlspath_10 | 391 | 2021-08-26T00:00:00Z | 4 |
adlspath_3 | 414 | 2021-08-27T00:00:00Z | 4 |
adlspath_6 | 594 | 2021-08-28T00:00:00Z | 3 |
adlspath_0 | 499 | 2021-08-28T00:00:00Z | 6 |
adlspath_4 | 552 | 2021-08-29T00:00:00Z | 6 |