创建一个每 1 分钟打开和关闭的 Prometheus 警报
create a Prometheus alert that flips on and off every 1 minute
我想创建一个 Prometheus 警报,它每分钟发送一个触发警报,然后自行解决并发送一个已解决的警报。相反,我看到的是警报一直在触发,而不是得到解决。
这是规则文件:
groups:
- name: example
rules:
- alert: 'flipping rule'
expr: minute() % 2
for: 30s
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- 192.168.8.158:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "prom-rule.yaml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
relabel_configs:
- source_labels: [branch]
regex: HEAD
action: drop
- job_name: "nginx-exporter"
static_configs:
- targets: ["192.168.8.158:9113"]
- job_name: "node-exporter"
static_configs:
- targets: ["localhost:9100"]
metric_relabel_configs:
- regex: 'node_arp_entries'
source_labels: [__name__]
action: keep
- regex: 'node_boot_time_seconds'
source_labels: [__name__]
action: keep
- job_name: "cadvior"
static_configs:
- targets: ["localhost:9999"]
这些照片显示警报只是保持活动状态,而不是像我期望的那样每分钟上下翻转
为规则的表达式添加一个明确的阈值应该可以解决问题,如下所示:
expr: 分钟() % 2 == 0
我想创建一个 Prometheus 警报,它每分钟发送一个触发警报,然后自行解决并发送一个已解决的警报。相反,我看到的是警报一直在触发,而不是得到解决。
这是规则文件:
groups:
- name: example
rules:
- alert: 'flipping rule'
expr: minute() % 2
for: 30s
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- 192.168.8.158:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "prom-rule.yaml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
relabel_configs:
- source_labels: [branch]
regex: HEAD
action: drop
- job_name: "nginx-exporter"
static_configs:
- targets: ["192.168.8.158:9113"]
- job_name: "node-exporter"
static_configs:
- targets: ["localhost:9100"]
metric_relabel_configs:
- regex: 'node_arp_entries'
source_labels: [__name__]
action: keep
- regex: 'node_boot_time_seconds'
source_labels: [__name__]
action: keep
- job_name: "cadvior"
static_configs:
- targets: ["localhost:9999"]
这些照片显示警报只是保持活动状态,而不是像我期望的那样每分钟上下翻转
为规则的表达式添加一个明确的阈值应该可以解决问题,如下所示:
expr: 分钟() % 2 == 0