Cloudwatch 指标筛选器发现事件,但未触发警报
Cloudwatch metric filter sees event, but alarm doesn't fire
我已经设置了 Cloudwatch 指标来监视日志文件:
resource "aws_cloudwatch_log_metric_filter" "log_errors" {
name = "${local.fullname}-log-errors"
log_group_name = "/aws/lambda/${local.fullname}"
pattern = "{ $._logLevel = \"error\" }"
metric_transformation {
name = "${local.fullname}-error-count"
namespace = "MyApp"
value = "1"
}
}
我可以看到指标正在运行 - 请注意下方 13:15 处的点(我手动创建日志条目进行测试):
如果指标在一分钟内报告 1 个或多个事件,则会触发警报:
resource "aws_cloudwatch_metric_alarm" "log_errors_alarm" {
alarm_name = "${local.fullname}-log-errors"
alarm_description = "log.error() count for MyApp lambda ${local.fullname}"
metric_name = "${local.fullname}-error-count"
threshold = "0"
statistic = "Sum"
unit = "Count"
comparison_operator = "GreaterThanThreshold"
datapoints_to_alarm = "1"
evaluation_periods = "1"
period = "60"
namespace = "MyApp"
treat_missing_data = "notBreaching"
alarm_actions = [data.aws_ssm_parameter.sns_topic_arn.value]
ok_actions = [data.aws_ssm_parameter.sns_topic_arn.value]
}
但是尽管指标有一个事件(根据上述),警报从未被触发:
我不确定如何调试它,因为所有 AWS 资源都已成功创建,我手动创建的错误会传递给指标,并且我在其他 lambda 中成功使用了非常相似的警报配置,其中它会发出警报。
为什么我的指标有效,但我的警报不报警?
我有一些非常相似的设置正在运行并且会尝试这个。更新:仔细观察,我相信你应该使用 comparison_operator = "GreaterThanOrEqualToThreshold"
而不是 comparison_operator = "GreaterThanThreshold"
metric_transformation {
name = "${local.fullname}-error-count"
namespace = "MyApp"
value = "1"
default_value = "0"
}
和
resource "aws_cloudwatch_metric_alarm" "log_errors_alarm" {
alarm_name = "${local.fullname}-log-errors"
alarm_description = "log.error() count for MyApp lambda ${local.fullname}"
metric_name = "${local.fullname}-error-count"
threshold = "1"
statistic = "Sum"
#unit = "Count"
comparison_operator = "GreaterThanOrEqualToThreshold"
#datapoints_to_alarm = "1"
evaluation_periods = "1"
period = "60"
namespace = "MyApp"
treat_missing_data = "notBreaching"
alarm_actions = [data.aws_ssm_parameter.sns_topic_arn.value]
ok_actions = [data.aws_ssm_parameter.sns_topic_arn.value]
}
unit
和datapoint_to_alarm
都是可选参数。尝试排除那些。我假设资源 cloudwatch_log_metric_filter
和 aws_cloudwatch_metric_alarm
都使用相同的局部变量。由于您没有 post 所有 cloudwatch_log_metric_filter
参数,我想您的 pattern = ""
应该是这样。
我会把钱花在 metric_alarm
和 metric_filter
之间不一致的单位上。
您将 metric_alarm
上的 unit
设置为 Count
,但您没有在 metric_filter
上设置 unit
' s metric_transformation
,因此 metric_transformation
将默认为 None
。
尝试将警报中的 unit
设置为 None
或完全删除 unit
。
我已经设置了 Cloudwatch 指标来监视日志文件:
resource "aws_cloudwatch_log_metric_filter" "log_errors" {
name = "${local.fullname}-log-errors"
log_group_name = "/aws/lambda/${local.fullname}"
pattern = "{ $._logLevel = \"error\" }"
metric_transformation {
name = "${local.fullname}-error-count"
namespace = "MyApp"
value = "1"
}
}
我可以看到指标正在运行 - 请注意下方 13:15 处的点(我手动创建日志条目进行测试):
如果指标在一分钟内报告 1 个或多个事件,则会触发警报:
resource "aws_cloudwatch_metric_alarm" "log_errors_alarm" {
alarm_name = "${local.fullname}-log-errors"
alarm_description = "log.error() count for MyApp lambda ${local.fullname}"
metric_name = "${local.fullname}-error-count"
threshold = "0"
statistic = "Sum"
unit = "Count"
comparison_operator = "GreaterThanThreshold"
datapoints_to_alarm = "1"
evaluation_periods = "1"
period = "60"
namespace = "MyApp"
treat_missing_data = "notBreaching"
alarm_actions = [data.aws_ssm_parameter.sns_topic_arn.value]
ok_actions = [data.aws_ssm_parameter.sns_topic_arn.value]
}
但是尽管指标有一个事件(根据上述),警报从未被触发:
我不确定如何调试它,因为所有 AWS 资源都已成功创建,我手动创建的错误会传递给指标,并且我在其他 lambda 中成功使用了非常相似的警报配置,其中它会发出警报。
为什么我的指标有效,但我的警报不报警?
我有一些非常相似的设置正在运行并且会尝试这个。更新:仔细观察,我相信你应该使用 comparison_operator = "GreaterThanOrEqualToThreshold"
而不是 comparison_operator = "GreaterThanThreshold"
metric_transformation {
name = "${local.fullname}-error-count"
namespace = "MyApp"
value = "1"
default_value = "0"
}
和
resource "aws_cloudwatch_metric_alarm" "log_errors_alarm" {
alarm_name = "${local.fullname}-log-errors"
alarm_description = "log.error() count for MyApp lambda ${local.fullname}"
metric_name = "${local.fullname}-error-count"
threshold = "1"
statistic = "Sum"
#unit = "Count"
comparison_operator = "GreaterThanOrEqualToThreshold"
#datapoints_to_alarm = "1"
evaluation_periods = "1"
period = "60"
namespace = "MyApp"
treat_missing_data = "notBreaching"
alarm_actions = [data.aws_ssm_parameter.sns_topic_arn.value]
ok_actions = [data.aws_ssm_parameter.sns_topic_arn.value]
}
unit
和datapoint_to_alarm
都是可选参数。尝试排除那些。我假设资源 cloudwatch_log_metric_filter
和 aws_cloudwatch_metric_alarm
都使用相同的局部变量。由于您没有 post 所有 cloudwatch_log_metric_filter
参数,我想您的 pattern = ""
应该是这样。
我会把钱花在 metric_alarm
和 metric_filter
之间不一致的单位上。
您将 metric_alarm
上的 unit
设置为 Count
,但您没有在 metric_filter
上设置 unit
' s metric_transformation
,因此 metric_transformation
将默认为 None
。
尝试将警报中的 unit
设置为 None
或完全删除 unit
。