创建一个每 1 分钟打开和关闭的 Prometheus 警报

create a Prometheus alert that flips on and off every 1 minute

我想创建一个 Prometheus 警报,它每分钟发送一个触发警报,然后自行解决并发送一个已解决的警报。相反,我看到的是警报一直在触发,而不是得到解决。

这是规则文件:

groups:
- name: example
  rules:
  - alert: 'flipping rule'
    expr: minute() % 2
    for: 30s

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - 192.168.8.158:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "prom-rule.yaml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]
    relabel_configs:
      - source_labels: [branch]
        regex: HEAD
        action: drop
  - job_name: "nginx-exporter"
    static_configs:
      - targets: ["192.168.8.158:9113"]
  - job_name: "node-exporter"
    static_configs:
      - targets: ["localhost:9100"]
    metric_relabel_configs:
      - regex: 'node_arp_entries'
        source_labels: [__name__]
        action: keep
      - regex: 'node_boot_time_seconds'
        source_labels: [__name__]
        action: keep
  - job_name: "cadvior"
    static_configs:
      - targets: ["localhost:9999"]

这些照片显示警报只是保持活动状态,而不是像我期望的那样每分钟上下翻转

为规则的表达式添加一个明确的阈值应该可以解决问题,如下所示:

expr: 分钟() % 2 == 0