如何使警报规则在 Prometheus 用户界面上可见?
How to make alert rules visible on Prometheus User Interface?
我正在尝试在 Prometheus 中设置一些警报规则,以便在实例关闭时收到警报,但是当我单击 prometheus 上的规则图标时 UI 我没有看到设置配置规则用于提醒。
我正在我的计算机上进行本地测试,docker prometheus、alertmanager、prom node_exporter 和
上列出的一些其他应用程序
请帮忙...
prometheus.yml文件如下图
PWD - /Users/spencer.ecas/ops/prometheus.yml
global:
scrape_interval: 15s
scrape-timeout; 10s
evaluation_interval: 15s
external_labels:
monitor: 'spencer'
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
rule_files:
- alert.rules.yml
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
labels:
group: 'prometheus-server'
- job_name: 'bis'
scrape_interval: 5s
metrics_path: /actor/prometheus
static_configs:
- targets: ['host.docker.internal:8790']
labels:
group: 'prometheus-bi-sanbox'
- job_name: "node"
scrape_interval: 5s
static_configs:
- targets: ['host.docker.internal:9100']
labels:
group: 'nodeexporter-server
alert.rules.yml
密码 - /Users/spencer.ecas/ops/prometheus/alert.rules.yml
groups:
- name: alert.rules
rules:
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
severity: "critical"
annotations:
summary: "Endpoint {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes."
- alert: HostOutOfMemory
expr: node_memory_MemAvailable / node_memory_MemTotal * 100 < 25
for: 5m
labels:
severity: warning
annotations:
summary: "Host out of memory (instance {{ $labels.instance }})"
description: "Node memory is filling up (< 25% left)\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: HostOutOfDiskSpace
expr: (node_filesystem_avail{mountpoint="/"} * 100) / node_filesystem_size{mountpoint="/"} < 50
for: 1s
labels:
severity: warning
annotations:
summary: "Host out of disk space (instance {{ $labels.instance }})"
description: "Disk is almost full (< 50% left)\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: HostHighCpuLoad
expr: (sum by (instance) (irate(node_cpu{job="node_exporter_metrics",mode="idle"}[5m]))) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "Host high CPU load (instance {{ $labels.instance }})"
description: "CPU load is > 80%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"`
alertmanager.yml
PWD - /Users/spencer.ecas/ops/alertmanager/alertmanager.yml
我在这里尝试将警报转发到我的松弛频道
global:
resolve_timeout: 5m
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'slack-notifications'
receivers:
- name: 'slack-notifications'
slack_configs:
- api_url: "https://hooks.slack.com/services/T06J2AUUR/B03CYRJPBPC/HcgsYeG1jjbduwb"
channel: '#alertmanager'
send_resolved: true`
一切似乎都已正确完成,但这里的问题可能是您如何启动 prometheus.yml 文件中的 prometheus 和 alert-manager 服务器。
其次,关于您的 promtheus.yml 文件,您确定配置文件正在从
读取警报规则吗
rule_files:
- alert.rules.yml
所以请编辑 prometheus.yml 文件并在 rule_files 下使用此路径代替
rule_files:
- "/etc/prometheus/alert.rules.yml"
我建议您删除 alertmanager 和 prometheus 容器并使用下面的命令。将 prometheus 容器与 alert.rules.yml 配置位置一起启动的原因是 alert.rules 将在 prometheus 容器上持久存在,因为规则将在 prometheus 服务器上使用以触发警报
确保在使用命令之前创建这样的目录
您应该在 /Users/spencer.ecas/ops/prometheus
中包含 prometheus.yml 文件
docker run -d --name prometheus_ops -p 9191:9090 -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml -v $(pwd)/alert.rules.yml:/etc/prometheus/alert.rules.yml prom/prometheus
这只是上面命令的更好显示 - 将它们视为相同
docker run -d --name prometheus_ops -p 9191:9090 -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml -v $(pwd)/alert.rules.yml:/etc/prometheus/alert.rules.yml prom/prometheus
我正在尝试在 Prometheus 中设置一些警报规则,以便在实例关闭时收到警报,但是当我单击 prometheus 上的规则图标时 UI 我没有看到设置配置规则用于提醒。
我正在我的计算机上进行本地测试,docker prometheus、alertmanager、prom node_exporter 和
上列出的一些其他应用程序请帮忙...
prometheus.yml文件如下图 PWD - /Users/spencer.ecas/ops/prometheus.yml
global:
scrape_interval: 15s
scrape-timeout; 10s
evaluation_interval: 15s
external_labels:
monitor: 'spencer'
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
rule_files:
- alert.rules.yml
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
labels:
group: 'prometheus-server'
- job_name: 'bis'
scrape_interval: 5s
metrics_path: /actor/prometheus
static_configs:
- targets: ['host.docker.internal:8790']
labels:
group: 'prometheus-bi-sanbox'
- job_name: "node"
scrape_interval: 5s
static_configs:
- targets: ['host.docker.internal:9100']
labels:
group: 'nodeexporter-server
alert.rules.yml 密码 - /Users/spencer.ecas/ops/prometheus/alert.rules.yml
groups:
- name: alert.rules
rules:
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
severity: "critical"
annotations:
summary: "Endpoint {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes."
- alert: HostOutOfMemory
expr: node_memory_MemAvailable / node_memory_MemTotal * 100 < 25
for: 5m
labels:
severity: warning
annotations:
summary: "Host out of memory (instance {{ $labels.instance }})"
description: "Node memory is filling up (< 25% left)\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: HostOutOfDiskSpace
expr: (node_filesystem_avail{mountpoint="/"} * 100) / node_filesystem_size{mountpoint="/"} < 50
for: 1s
labels:
severity: warning
annotations:
summary: "Host out of disk space (instance {{ $labels.instance }})"
description: "Disk is almost full (< 50% left)\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: HostHighCpuLoad
expr: (sum by (instance) (irate(node_cpu{job="node_exporter_metrics",mode="idle"}[5m]))) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "Host high CPU load (instance {{ $labels.instance }})"
description: "CPU load is > 80%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"`
alertmanager.yml PWD - /Users/spencer.ecas/ops/alertmanager/alertmanager.yml
我在这里尝试将警报转发到我的松弛频道
global:
resolve_timeout: 5m
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'slack-notifications'
receivers:
- name: 'slack-notifications'
slack_configs:
- api_url: "https://hooks.slack.com/services/T06J2AUUR/B03CYRJPBPC/HcgsYeG1jjbduwb"
channel: '#alertmanager'
send_resolved: true`
一切似乎都已正确完成,但这里的问题可能是您如何启动 prometheus.yml 文件中的 prometheus 和 alert-manager 服务器。
其次,关于您的 promtheus.yml 文件,您确定配置文件正在从
读取警报规则吗rule_files:
- alert.rules.yml
所以请编辑 prometheus.yml 文件并在 rule_files 下使用此路径代替
rule_files:
- "/etc/prometheus/alert.rules.yml"
我建议您删除 alertmanager 和 prometheus 容器并使用下面的命令。将 prometheus 容器与 alert.rules.yml 配置位置一起启动的原因是 alert.rules 将在 prometheus 容器上持久存在,因为规则将在 prometheus 服务器上使用以触发警报
确保在使用命令之前创建这样的目录
您应该在 /Users/spencer.ecas/ops/prometheus
docker run -d --name prometheus_ops -p 9191:9090 -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml -v $(pwd)/alert.rules.yml:/etc/prometheus/alert.rules.yml prom/prometheus
这只是上面命令的更好显示 - 将它们视为相同
docker run -d --name prometheus_ops -p 9191:9090 -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml -v $(pwd)/alert.rules.yml:/etc/prometheus/alert.rules.yml prom/prometheus