如何使用 prometheus alertmanager 禁止营业时间以外的警报?

How to inhibit alerts outside business hours with prometheus alertmanager?

我们的应用程序依赖于仅在工作时间活动的数据源。我们在 Prometheus 中设置了警报,以便在流干涸时通知我们。但是,我们不想在工作时间以外收到 "false" 警报。

我按照 this post 设置了一个 "fake alert",它会在工作时间以外触发,并且应该会禁止所有其他警报。

设置如下所示。在普罗米修斯:

rules:

# This special alert will be used to inhibit all other alerts outside business hours
- alert: QuietHours
  expr: day_of_week() == 6 or day_of_week() == 0 or europe_amsterdam_hour >= 18 or europe_amsterdam_hour <= 7
  for: 1m
  labels:
    notification: page
    severity: critical
  annotations:
    description: 'This alert fires during quiet hours. It should be blackholed by Alertmanager.'

europe_amsterdam_hour 函数被定义为规则,为简洁起见,此示例中省略了该函数。

在警报管理器中:

routes:
# ensure to forward to blackhole receiver during quiet hours
- match:
    alertname: QuietHours
  receiver: blackhole

inhibit_rules:
- source_match:
    alertname: QuietHours
  target_match_re:
    alertname: '[^(QuietHours)]'

我确认触发 QuietHours 警报的逻辑正在运行。它在工作时间后很好地触发并在工作时间解决。但是,抑制部分似乎不起作用,因为当 QuietHours 处于活动状态时,我仍然会收到其他警报。我找不到详细解释抑制配置的好资源。

知道我做错了什么吗?

问题出在你的目标重新,语法不正确。如 inhibit_rule 文档中所述,无需排除 QuietHours

To prevent an alert from inhibiting itself, an alert that matches both the target and the source side of a rule cannot be inhibited by alerts for which the same is true (including itself).

正则表达式应仅匹配与您的数据源相关的警报。

添加标签来识别与来源相关的警报以禁止和使用它比使用警报名称更容易。

inhibit_rules:
- source_match:
    alertname: QuietHours
  target_match:
    component: 'data_source'

这样一来,与源相关的任何新警报都将被禁止。