如何使警报规则在 Prometheus 用户界面上可见?

How to make alert rules visible on Prometheus User Interface?

我正在尝试在 Prometheus 中设置一些警报规则,以便在实例关闭时收到警报,但是当我单击 prometheus 上的规则图标时 UI 我没有看到设置配置规则用于提醒。

我正在我的计算机上进行本地测试,docker prometheus、alertmanager、prom node_exporter 和

上列出的一些其他应用程序

请帮忙...

prometheus.yml文件如下图 PWD - /Users/spencer.ecas/ops/prometheus.yml

global:
    scrape_interval:  15s
  scrape-timeout;  10s
  evaluation_interval: 15s
  external_labels:
    monitor: 'spencer'

alerting:
  alertmanagers:
   - static_configs:
     - targets:
       -  localhost:9093

rule_files:
  - alert.rules.yml

scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']
        labels:
          group: 'prometheus-server'

  - job_name: 'bis'
    scrape_interval: 5s
    metrics_path: /actor/prometheus
    static_configs:
      - targets: ['host.docker.internal:8790']
        labels:
          group: 'prometheus-bi-sanbox'

  - job_name: "node"
    scrape_interval: 5s
    static_configs:
      - targets: ['host.docker.internal:9100']
        labels:
          group: 'nodeexporter-server

alert.rules.yml 密码 - /Users/spencer.ecas/ops/prometheus/alert.rules.yml

groups:
- name: alert.rules
  rules:
  - alert: InstanceDown
    expr: up == 0
    for: 1m
    labels:
      severity: "critical"
    annotations:
      summary: "Endpoint {{ $labels.instance }} down"
      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes."
  
  - alert: HostOutOfMemory
    expr: node_memory_MemAvailable / node_memory_MemTotal * 100 < 25
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Host out of memory (instance {{ $labels.instance }})"
      description: "Node memory is filling up (< 25% left)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"


  - alert: HostOutOfDiskSpace
    expr: (node_filesystem_avail{mountpoint="/"}  * 100) / node_filesystem_size{mountpoint="/"} < 50
    for: 1s
    labels:
      severity: warning
    annotations:
      summary: "Host out of disk space (instance {{ $labels.instance }})"
      description: "Disk is almost full (< 50% left)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"


  - alert: HostHighCpuLoad
    expr: (sum by (instance) (irate(node_cpu{job="node_exporter_metrics",mode="idle"}[5m]))) > 80
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Host high CPU load (instance {{ $labels.instance }})"
      description: "CPU load is > 80%\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"`

alertmanager.yml PWD - /Users/spencer.ecas/ops/alertmanager/alertmanager.yml

我在这里尝试将警报转发到我的松弛频道

global:
  resolve_timeout: 5m

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'slack-notifications'
receivers:
- name: 'slack-notifications'
  slack_configs:
  - api_url: "https://hooks.slack.com/services/T06J2AUUR/B03CYRJPBPC/HcgsYeG1jjbduwb"
    channel: '#alertmanager'
    send_resolved: true`

一切似乎都已正确完成,但这里的问题可能是您如何启动 prometheus.yml 文件中的 prometheus 和 alert-manager 服务器。

其次,关于您的 promtheus.yml 文件,您确定配置文件正在从

读取警报规则吗
rule_files:
 - alert.rules.yml

所以请编辑 prometheus.yml 文件并在 rule_files 下使用此路径代替

rule_files:
 - "/etc/prometheus/alert.rules.yml"

我建议您删除 alertmanager 和 prometheus 容器并使用下面的命令。将 prometheus 容器与 alert.rules.yml 配置位置一起启动的原因是 alert.rules 将在 prometheus 容器上持久存在,因为规则将在 prometheus 服务器上使用以触发警报

确保在使用命令之前创建这样的目录 您应该在 /Users/spencer.ecas/ops/prometheus

中包含 prometheus.yml 文件
docker run -d --name prometheus_ops -p 9191:9090 -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml -v $(pwd)/alert.rules.yml:/etc/prometheus/alert.rules.yml prom/prometheus

这只是上面命令的更好显示 - 将它们视为相同

docker run -d --name prometheus_ops -p 9191:9090 -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml -v $(pwd)/alert.rules.yml:/etc/prometheus/alert.rules.yml prom/prometheus