Alertmanager 未加载 webhook_config

Alertmanager does not load webhook_config

我想为 alertmanager 创建新的接收器和路由以将心跳发送到 OpsGenie。

我试图通过定义 opsgenie_config 来实现它,但我无法将 ping 发送到 OpsGenie 中的心跳(我能够使用相同的 api 密钥向 OpsGenie 发送警报)。

我发现的另一种方法是使用 webhook_config(如 #444 中所建议),我的清单如下所示:

apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
  name: opsgenie-webhook
  labels:
    managedBy: team-sre
spec:
  receivers:
  - name: heartbeat
    webhookConfigs:
    - httpConfig:
        basicAuth:
          password:
            name: opsgenie-api-key
            key: address
      url: https://api.opsgenie.com/v2/heartbeats/sre-test-cluster/ping
  route:
    groupWait: 0s
    repeatInterval: 1m
    groupInterval: 1m
    matchers:
    - name: alertname
      value: Watchdog
    receiver: heartbeat

当我应用清单时,描述的接收器和路由没有加载到 Alertmanager。当我检查日志时,没有记录错误,但也没有消息表明 sidecar 试图加载新的 alertmanagerconfig。

有没有人遇到同样的问题并且知道如何解决?

我在 github issue #3970 上找到了解决方案 要接受 basicAuth,必须提供用户名和密码。 不错的技巧是将用户名设置为 : base64 格式 (Og==)。 清单应定义如下:

apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
  labels:
    managedBy: team-sre
  name: alertmanager-opsgenie-config
  namespace: monitoring
spec:
  receivers:
  - name: deadmansswitch
    webhookConfigs:
      # url link to the specific heartbeat, replace test with heartbeat name
      - url: 'https://api.opsgenie.com/v2/heartbeats/<hearbeat-name>/ping'
        sendResolved: true
        httpConfig:
          basicAuth:
            # reference to secret containing login credentals
            password:
              key: apiKey
              name: opsgenie
            username:
              key: username
              name: opsgenie
  route:
    groupBy:
    - job
    groupInterval: 10s
    groupWait: 0s
    repeatInterval: 10s
    matchers:
      - name: alertname
        value: Watchdog
      - name: namespace
        value: monitoring
    receiver: deadmansswitch

---

apiVersion: v1
kind: Secret
metadata:
  namespace: monitoring
  name: opsgenie
type: Opaque
data:
  # apiKey in encoded in base64
  apiKey: YOUR_PASSWORD
  # ':' in base 64 - fix suggested in https://github.com/prometheus-operator/prometheus-operator/issues/3970#issuecomment-888893008
  username: Og==

在应用清单并触发符合条件的警报定义后,Opsgenie 受到检测信号的影响。