Prometheus:如何为 1 个特定 job_name 禁用 1 条规则?
Prometheus: How to disable 1 rule for 1 specific job_name?
我正在为 2 个 elasticsearch 集群设置 prometheus 警报(使用 elasticsearch_exporter),1 个有 8 个节点,1 个有 3 个节点。
我想要的是在每个集群丢失 1 个节点时发送警报,但目前所有规则都适用于两个集群。所以不可能。
prometheus.yml 文件
global:
scrape_interval: 10s
rule_files:
- alert.rules.yml
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
scrape_configs:
- job_name: cluster1
scrape_interval: 30s
scrape_timeout: 30s
metrics_path: "/metrics"
static_configs:
- targets: ['xxx1:9114' ]
labels:
service: cluster1
- job_name: cluster2
scrape_interval: 30s
scrape_timeout: 30s
metrics_path: "/metrics"
static_configs:
- targets: ['xxx2:9114' ]
labels:
service: cluster2
alert.rules.yml 文件:
groups:
- name: alert.rules
rules:
- alert: ElasticsearchLostNode
expr: elasticsearch_cluster_health_number_of_nodes < 8
for: 1m
labels:
severity: warning
annotations:
summary: Elasticsearch Healthy Nodes (instance {{ $labels.instance }})
description: Number Healthy Nodes less than 8
...
Ofc number_of_nodes < 8 将始终适用于小型集群,如果我设置 < 3,则当大型集群丢失 1 个节点时不会触发警报。
有没有办法为 1 个特定 job_name 免除 1 个特定规则,或者定义这些规则 A 适用于 1 个特定 job_name A,这些规则 B 适用于 1 个特定 job_name B?
是的,您可以在 alert.rules.yml 文件中为每个作业创建一个规则:
groups:
- name: alert.rules
rules:
- alert: ElasticsearchLostNode1
expr: elasticsearch_cluster_health_number_of_nodes{job="cluster1"} < 8
...
- alert: ElasticsearchLostNode2
expr: elasticsearch_cluster_health_number_of_nodes{job="cluster2"} < 3
...
我正在为 2 个 elasticsearch 集群设置 prometheus 警报(使用 elasticsearch_exporter),1 个有 8 个节点,1 个有 3 个节点。 我想要的是在每个集群丢失 1 个节点时发送警报,但目前所有规则都适用于两个集群。所以不可能。
prometheus.yml 文件
global:
scrape_interval: 10s
rule_files:
- alert.rules.yml
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
scrape_configs:
- job_name: cluster1
scrape_interval: 30s
scrape_timeout: 30s
metrics_path: "/metrics"
static_configs:
- targets: ['xxx1:9114' ]
labels:
service: cluster1
- job_name: cluster2
scrape_interval: 30s
scrape_timeout: 30s
metrics_path: "/metrics"
static_configs:
- targets: ['xxx2:9114' ]
labels:
service: cluster2
alert.rules.yml 文件:
groups:
- name: alert.rules
rules:
- alert: ElasticsearchLostNode
expr: elasticsearch_cluster_health_number_of_nodes < 8
for: 1m
labels:
severity: warning
annotations:
summary: Elasticsearch Healthy Nodes (instance {{ $labels.instance }})
description: Number Healthy Nodes less than 8
...
Ofc number_of_nodes < 8 将始终适用于小型集群,如果我设置 < 3,则当大型集群丢失 1 个节点时不会触发警报。
有没有办法为 1 个特定 job_name 免除 1 个特定规则,或者定义这些规则 A 适用于 1 个特定 job_name A,这些规则 B 适用于 1 个特定 job_name B?
是的,您可以在 alert.rules.yml 文件中为每个作业创建一个规则:
groups:
- name: alert.rules
rules:
- alert: ElasticsearchLostNode1
expr: elasticsearch_cluster_health_number_of_nodes{job="cluster1"} < 8
...
- alert: ElasticsearchLostNode2
expr: elasticsearch_cluster_health_number_of_nodes{job="cluster2"} < 3
...