如何在 GCP 中为未知自定义指标创建警报策略
How to create an alert policy for unknown custom metric in GCP
鉴于 GCP 中的以下警报策略(使用 terraform 创建)
resource "google_monitoring_alert_policy" "latency_alert_policy" {
display_name = "Latency of 95th percentile more than 1 second"
combiner = "OR"
conditions {
display_name = "Latency of 95th percentile more than 1 second"
condition_threshold {
filter = "metric.type=\"custom.googleapis.com/http/server/requests/p95\" resource.type=\"k8s_pod\""
threshold_value = 1000
duration = "60s"
comparison = "COMPARISON_GT"
aggregations {
alignment_period = "60s"
per_series_aligner= "ALIGN_NEXT_OLDER"
cross_series_reducer= "REDUCE_MAX"
group_by_fields = [
"metric.label.\"uri\"",
"metric.label.\"method\"",
"metric.label.\"status\"",
"metadata.user_labels.\"app.kubernetes.io/name\"",
"metadata.user_labels.\"app.kubernetes.io/component\""
]
}
trigger {
count = 1
percent = 0
}
}
}
}
我收到以下错误(也是创建集群的 Terraform 项目的一部分):
Error creating AlertPolicy: googleapi: Error 404: The metric referenced by the provided filter is unknown. Check the metric name and labels.
现在,这是一个自定义指标(通过带有 Micrometer 的 Spring 启动应用程序),因此在创建基础架构时该指标不存在。 GCP 在为其创建警报之前是否必须知道指标?这意味着 Spring 必须在集群上部署启动应用程序并发送指标才能创建此策略?
我是否遗漏了什么...(像这样不应该在 terraform、基础设施中完成)?
有趣的问题,404错误的原因是因为找不到资源,描述符似乎已经存在pre-requisite。我会创建 metric descriptor first, you can use this 作为参考,然后继续创建警报策略。
这是一个可以避免它的巧妙方法。请评论它是否有意义,如果你让它像这样工作,请分享它。
供参考(这个可以参考terraform doc的告警策略):
resource "google_monitoring_metric_descriptor" "p95_latency" {
description = ""
display_name = ""
type = "custom.googleapis.com/http/server/requests/p95"
metric_kind = "GAUGE"
value_type = "DOUBLE"
labels {
key = "status"
}
labels {
key = "uri"
}
labels {
key = "exception"
}
labels {
key = "method"
}
labels {
key = "outcome"
}
}
鉴于 GCP 中的以下警报策略(使用 terraform 创建)
resource "google_monitoring_alert_policy" "latency_alert_policy" {
display_name = "Latency of 95th percentile more than 1 second"
combiner = "OR"
conditions {
display_name = "Latency of 95th percentile more than 1 second"
condition_threshold {
filter = "metric.type=\"custom.googleapis.com/http/server/requests/p95\" resource.type=\"k8s_pod\""
threshold_value = 1000
duration = "60s"
comparison = "COMPARISON_GT"
aggregations {
alignment_period = "60s"
per_series_aligner= "ALIGN_NEXT_OLDER"
cross_series_reducer= "REDUCE_MAX"
group_by_fields = [
"metric.label.\"uri\"",
"metric.label.\"method\"",
"metric.label.\"status\"",
"metadata.user_labels.\"app.kubernetes.io/name\"",
"metadata.user_labels.\"app.kubernetes.io/component\""
]
}
trigger {
count = 1
percent = 0
}
}
}
}
我收到以下错误(也是创建集群的 Terraform 项目的一部分):
Error creating AlertPolicy: googleapi: Error 404: The metric referenced by the provided filter is unknown. Check the metric name and labels.
现在,这是一个自定义指标(通过带有 Micrometer 的 Spring 启动应用程序),因此在创建基础架构时该指标不存在。 GCP 在为其创建警报之前是否必须知道指标?这意味着 Spring 必须在集群上部署启动应用程序并发送指标才能创建此策略?
我是否遗漏了什么...(像这样不应该在 terraform、基础设施中完成)?
有趣的问题,404错误的原因是因为找不到资源,描述符似乎已经存在pre-requisite。我会创建 metric descriptor first, you can use this 作为参考,然后继续创建警报策略。
这是一个可以避免它的巧妙方法。请评论它是否有意义,如果你让它像这样工作,请分享它。
供参考(这个可以参考terraform doc的告警策略):
resource "google_monitoring_metric_descriptor" "p95_latency" {
description = ""
display_name = ""
type = "custom.googleapis.com/http/server/requests/p95"
metric_kind = "GAUGE"
value_type = "DOUBLE"
labels {
key = "status"
}
labels {
key = "uri"
}
labels {
key = "exception"
}
labels {
key = "method"
}
labels {
key = "outcome"
}
}