是否可以为任何不健康的 Prometheus Consul 目标创建 Grafana 警报？

Is it possible to create a Grafana alert for any unhealthy Prometheus Consul targets?

Prometheus 可以设置为收集 Consul 目标的指标。

Prometheus 的“目标”页面显示了已配置目标的概览，包括 healthy/total 个目标的数量（在下面的示例中，有 20 个健康目标和 22 个总目标）

有没有什么方法可以在 Grafana 中创建警报以在并非所有目标都健康时触发？在下面的示例中，应该会触发警报，因为并非所有 22 个目标都已启动。

我发现 prometheus_sd_discovered_targets 其中包含目标总数，但似乎没有公开健康目标数量的指标。

正如 Raven 所指出的，up 指标可用于此目的。

For each instance scrape, Prometheus stores a sample in the following time series:

up{job="<job-name>", instance="<instance-id>"}: 1 if the instance is healthy, i.e. reachable, or 0 if the scrape failed.

The up time series is useful for instance availability monitoring.

像 up < 1 这样的普罗米修斯查询会为您提供当前不健康的目标。

从那里你可以创建一个带有参数的 Grafana 警报，比如

when last() of query (A, 5m, now) is above -1
If no data or all values are null set state to Ok

是否可以为任何不健康的 Prometheus Consul 目标创建 Grafana 警报？

Is it possible to create a Grafana alert for any unhealthy Prometheus Consul targets?

grafana

consul

prometheus