如何改进热图大型数据集的 KQL 查询
How can I improve KQL query for large dataset for heatmap
我在下面有一个 KQL 查询,它将提供一个非常好的热图来绘制 Azure WAF 按国家/地区的顶级访问。
这里的挑战是这个查询不能超过 24 小时,因为我的记录数量太大了。我怎样才能改进它以显示每周和每月的统计数据?
// source: https://datahub.io/core/geoip2-ipv4
set notruncation;
let CountryDB=externaldata(Network:string, geoname_id:string, continent_code:string, continent_name:string, country_iso_code:string, country_name:string)
[@"https://datahub.io/core/geoip2-ipv4/r/geoip2-ipv4.csv"]
| extend Dummy=1;
let AppGWAccess = AzureDiagnostics
| where ResourceType == "APPLICATIONGATEWAYS"
| where Category == "ApplicationGatewayAccessLog"
| where userAgent_s !in ("bot")
| project TimeGenerated, clientIP_s;
AppGWAccess
| extend Dummy=1
| summarize count() by Hour=bin(TimeGenerated,6h), clientIP_s,Dummy
| partition by Hour(
lookup (CountryDB|extend Dummy=1) on Dummy
| where ipv4_is_match(clientIP_s, Network)
)
| summarize sum(count_) by country_name
您正在做的是对所有数据创建每小时聚合。相反,您应该创建一个 Materialized View 来在后台为您进行聚合。
引用文档:
Materialized views expose an aggregation query over a source table. Materialized views always return an up-to-date result of the aggregation query (always fresh). Querying a materialized view is more performant than running the aggregation directly over the source table, which is performed each query.
我在下面有一个 KQL 查询,它将提供一个非常好的热图来绘制 Azure WAF 按国家/地区的顶级访问。
这里的挑战是这个查询不能超过 24 小时,因为我的记录数量太大了。我怎样才能改进它以显示每周和每月的统计数据?
// source: https://datahub.io/core/geoip2-ipv4
set notruncation;
let CountryDB=externaldata(Network:string, geoname_id:string, continent_code:string, continent_name:string, country_iso_code:string, country_name:string)
[@"https://datahub.io/core/geoip2-ipv4/r/geoip2-ipv4.csv"]
| extend Dummy=1;
let AppGWAccess = AzureDiagnostics
| where ResourceType == "APPLICATIONGATEWAYS"
| where Category == "ApplicationGatewayAccessLog"
| where userAgent_s !in ("bot")
| project TimeGenerated, clientIP_s;
AppGWAccess
| extend Dummy=1
| summarize count() by Hour=bin(TimeGenerated,6h), clientIP_s,Dummy
| partition by Hour(
lookup (CountryDB|extend Dummy=1) on Dummy
| where ipv4_is_match(clientIP_s, Network)
)
| summarize sum(count_) by country_name
您正在做的是对所有数据创建每小时聚合。相反,您应该创建一个 Materialized View 来在后台为您进行聚合。
引用文档:
Materialized views expose an aggregation query over a source table. Materialized views always return an up-to-date result of the aggregation query (always fresh). Querying a materialized view is more performant than running the aggregation directly over the source table, which is performed each query.