Azure AKS 监控 - 自定义仪表板资源

Azure AKS Monitoring - custom dashboard resources

我正在尝试为 AKS 群集创建一个包含一些特定数据的自定义仪表板。我想做的是 assemble 一个仪表板,其中包含 RAM 图表和每个选定控制器和节点的 CPU 使用情况,如果可能的话,每个 pod 的重启次数。我如何使用控制器平均资源使用率创建自定义图表?

您可以单击 Azure 门户中 AKS 群集边栏左侧的 "Logs" link(请先单击 "Insights" 确保您已启用 Insights - 如果它是好的,您会看到接近您想要的图表,否则,您会看到入职说明)。

使用以下查询绘制给定控制器中所有容器的 CPU 利用率(第 95 个 %-tile):

let endDateTime = now();
let startDateTime = ago(14d);
let trendBinSize = 1d;
let capacityCounterName = 'cpuLimitNanoCores';
let usageCounterName = 'cpuUsageNanoCores';
let clusterName = 'coin-test-i';
let controllerName = 'kube-svc-redirect';
KubePodInventory
| where TimeGenerated < endDateTime
| where TimeGenerated >= startDateTime
| where ClusterName == clusterName
| where ControllerName == controllerName
| extend InstanceName = strcat(ClusterId, '/', ContainerName), 
         ContainerName = strcat(controllerName, '/', tostring(split(ContainerName, '/')[1]))
| distinct Computer, InstanceName, ContainerName
| join hint.strategy=shuffle (
    Perf
    | where TimeGenerated < endDateTime
    | where TimeGenerated >= startDateTime
    | where ObjectName == 'K8SContainer'
    | where CounterName == capacityCounterName
    | summarize LimitValue = max(CounterValue) by Computer, InstanceName, bin(TimeGenerated, trendBinSize)
    | project Computer, InstanceName, LimitStartTime = TimeGenerated, LimitEndTime = TimeGenerated + trendBinSize, LimitValue
) on Computer, InstanceName
| join kind=inner hint.strategy=shuffle (
    Perf
    | where TimeGenerated < endDateTime + trendBinSize
    | where TimeGenerated >= startDateTime - trendBinSize
    | where ObjectName == 'K8SContainer'
    | where CounterName == usageCounterName
    | project Computer, InstanceName, UsageValue = CounterValue, TimeGenerated
) on Computer, InstanceName
| where TimeGenerated >= LimitStartTime and TimeGenerated < LimitEndTime
| project Computer, ContainerName, TimeGenerated, UsagePercent = UsageValue * 100.0 / LimitValue
| summarize P95 = percentile(UsagePercent, 95) by bin(TimeGenerated, trendBinSize) , ContainerName
| render timechart

用你想要的替换集群名称和控制器名称。您还可以使用 start/end 时间参数、bin 大小、max/min/avg 代替第 95 个 %-tile。

对于内存指标,将指标名称替换为:

let capacityCounterName = 'memoryLimitBytes';
let usageCounterName = 'memoryRssBytes';