Cloudwatch 指标筛选器发现事件,但未触发警报

Cloudwatch metric filter sees event, but alarm doesn't fire

我已经设置了 Cloudwatch 指标来监视日志文件:


resource "aws_cloudwatch_log_metric_filter" "log_errors" {
  name = "${local.fullname}-log-errors"
  log_group_name = "/aws/lambda/${local.fullname}"
  pattern = "{ $._logLevel = \"error\" }"
  metric_transformation {
    name = "${local.fullname}-error-count"
    namespace = "MyApp"
    value     = "1"
  }
}

我可以看到指标正在运行 - 请注意下方 13:15 处的点(我手动创建日志条目进行测试):

如果指标在一分钟内报告 1 个或多个事件,则会触发警报:


resource "aws_cloudwatch_metric_alarm" "log_errors_alarm" {
  alarm_name        = "${local.fullname}-log-errors"
  alarm_description = "log.error() count for MyApp lambda ${local.fullname}"
  metric_name         = "${local.fullname}-error-count"
  threshold           = "0"
  statistic           = "Sum"
  unit                = "Count"
  comparison_operator = "GreaterThanThreshold"
  datapoints_to_alarm = "1"
  evaluation_periods  = "1"
  period    = "60"
  namespace = "MyApp"
  treat_missing_data = "notBreaching"
  alarm_actions      = [data.aws_ssm_parameter.sns_topic_arn.value]
  ok_actions         = [data.aws_ssm_parameter.sns_topic_arn.value]
}

但是尽管指标有一个事件(根据上述),警报从未被触发:

我不确定如何调试它,因为所有 AWS 资源都已成功创建,我手动创建的错误会传递给指标,并且我在其他 lambda 中成功使用了非常相似的警报配置,其中它会发出警报。

为什么我的指标有效,但我的警报不报警?

我有一些非常相似的设置正在运行并且会尝试这个。更新:仔细观察,我相信你应该使用 comparison_operator = "GreaterThanOrEqualToThreshold" 而不是 comparison_operator = "GreaterThanThreshold"

metric_transformation {
    name = "${local.fullname}-error-count"
    namespace = "MyApp"
    value     = "1"
    default_value = "0"
  }

resource "aws_cloudwatch_metric_alarm" "log_errors_alarm" {
  alarm_name        = "${local.fullname}-log-errors"
  alarm_description = "log.error() count for MyApp lambda ${local.fullname}"
  metric_name         = "${local.fullname}-error-count"
  threshold           = "1"
  statistic           = "Sum"
  #unit                = "Count"
  comparison_operator = "GreaterThanOrEqualToThreshold"
  #datapoints_to_alarm = "1"
  evaluation_periods  = "1"
  period    = "60"
  namespace = "MyApp"
  treat_missing_data = "notBreaching"
  alarm_actions      = [data.aws_ssm_parameter.sns_topic_arn.value]
  ok_actions         = [data.aws_ssm_parameter.sns_topic_arn.value]
}

unitdatapoint_to_alarm都是可选参数。尝试排除那些。我假设资源 cloudwatch_log_metric_filteraws_cloudwatch_metric_alarm 都使用相同的局部变量。由于您没有 post 所有 cloudwatch_log_metric_filter 参数,我想您的 pattern = "" 应该是这样。

我会把钱花在 metric_alarmmetric_filter 之间不一致的单位上。

您将 metric_alarm 上的 unit 设置为 Count,但您没有在 metric_filter 上设置 unit' s metric_transformation,因此 metric_transformation 将默认为 None

尝试将警报中的 unit 设置为 None 或完全删除 unit