如何在 GCP 中为未知自定义指标创建警报策略

How to create an alert policy for unknown custom metric in GCP

鉴于 GCP 中的以下警报策略(使用 terraform 创建)

resource "google_monitoring_alert_policy" "latency_alert_policy" {
  display_name = "Latency of 95th percentile more than 1 second"
  combiner     = "OR"
  conditions {
    display_name = "Latency of 95th percentile more than 1 second"
    condition_threshold {
      filter     = "metric.type=\"custom.googleapis.com/http/server/requests/p95\" resource.type=\"k8s_pod\""
      threshold_value = 1000
      duration   = "60s"
      comparison = "COMPARISON_GT"
      aggregations {
        alignment_period = "60s"
        per_series_aligner= "ALIGN_NEXT_OLDER"
        cross_series_reducer= "REDUCE_MAX"
        group_by_fields      = [
          "metric.label.\"uri\"",
          "metric.label.\"method\"",
          "metric.label.\"status\"",
          "metadata.user_labels.\"app.kubernetes.io/name\"",
          "metadata.user_labels.\"app.kubernetes.io/component\""
        ]
      }
      trigger {
        count = 1
        percent = 0
      }
    }
  }
}

我收到以下错误(也是创建集群的 Terraform 项目的一部分):

Error creating AlertPolicy: googleapi: Error 404: The metric referenced by the provided filter is unknown. Check the metric name and labels.

现在,这是一个自定义指标(通过带有 Micrometer 的 Spring 启动应用程序),因此在创建基础架构时该指标不存在。 GCP 在为其创建警报之前是否必须知道指标?这意味着 Spring 必须在集群上部署启动应用程序并发送指标才能创建此策略?

我是否遗漏了什么...(像这样不应该在 terraform、基础设施中完成)?

有趣的问题,404错误的原因是因为找不到资源,描述符似乎已经存在pre-requisite。我会创建 metric descriptor first, you can use this 作为参考,然后继续创建警报策略。

这是一个可以避免它的巧妙方法。请评论它是否有意义,如果你让它像这样工作,请分享它。

供参考(这个可以参考terraform doc的告警策略):

resource "google_monitoring_metric_descriptor" "p95_latency" {
  description = ""
  display_name = ""
  type = "custom.googleapis.com/http/server/requests/p95"
  metric_kind = "GAUGE"
  value_type = "DOUBLE"

  labels {
    key = "status"
  }
  labels {
    key = "uri"
  }
  labels {
    key = "exception"
  }
  labels {
    key = "method"
  }
  labels {
    key = "outcome"
  }

}