如何将 CloudWatch 警报配置为每 X 分钟评估一次
How to configure a CloudWatch alarm to evaluate once every X minutes
我想将 CloudWatch 警报配置为:
- 对 ApplicationRequestsTotal 指标的最后 30 分钟求和每 30 分钟一次
- 总和为0时报警
我已将自定义 CloudWatch ApplicationRequestsTotal 指标配置为每 60 秒为我的服务发出一次。
我已将闹钟配置为:
{
"MetricAlarms": [
{
"AlarmName": "radio-silence-alarm",
"AlarmDescription": "Alarm if 0 or less requests are received for 1 consecutive period(s) of 30 minutes.",
"ActionsEnabled": true,
"OKActions": [],
"InsufficientDataActions": [],
"MetricName": "ApplicationRequestsTotal",
"Namespace": "AWS/ElasticBeanstalk",
"Statistic": "Sum",
"Dimensions": [
{
"Name": "EnvironmentName",
"Value": "service-environment"
}
],
"Period": 1800,
"EvaluationPeriods": 1,
"Threshold": 0.0,
"ComparisonOperator": "LessThanOrEqualToThreshold",
"TreatMissingData": "missing"
}
],
"CompositeAlarms": []
}
我设置了很多这样的闹钟,每个闹钟似乎:
- 对最后 30 分钟的 ApplicationRequestsTotal 指标求和一次每分钟
例如,此服务在 8:36a 开始获取 0 ApplicationRequestsTotal,而恰好在 9:06a CloudWatch 触发了警报。
以上时间段的aws cloudwatch describe-alarm-history:
{
"AlarmName": "radio-silence-alarm",
"AlarmType": "MetricAlarm",
"Timestamp": "2021-09-29T09:06:37.929000+00:00",
"HistoryItemType": "StateUpdate",
"HistorySummary": "Alarm updated from OK to ALARM",
"HistoryData": "{
"version":"1.0",
"oldState":{
"stateValue":"OK",
"stateReason":"Threshold Crossed: 1 datapoint [42.0 (22/09/21 08:17:00)] was not less than or equal to the threshold (0.0).",
"stateReasonData":{
"version":"1.0",
"queryDate":"2021-09-22T08:47:37.930+0000",
"startDate":"2021-09-22T08:17:00.000+0000",
"statistic":"Sum",
"period":1800,
"recentDatapoints":[
42.0
],
"threshold":0.0,
"evaluatedDatapoints":[
{
"timestamp":"2021-09-22T08:17:00.000+0000",
"sampleCount":30.0,
"value":42.0
}
]
}
},
"newState":{
"stateValue":"ALARM",
"stateReason":"Threshold Crossed: 1 datapoint [0.0 (29/09/21 08:36:00)] was less than or equal to the threshold (0.0).",
"stateReasonData":{
"version":"1.0",
"queryDate":"2021-09-29T09:06:37.926+0000",
"startDate":"2021-09-29T08:36:00.000+0000",
"statistic":"Sum",
"period":1800,
"recentDatapoints":[
0.0
],
"threshold":0.0,
"evaluatedDatapoints":[
{
"timestamp":"2021-09-29T08:36:00.000+0000",
"sampleCount":30.0,
"value":0.0
}
]
}
}
}"
}
我哪里配置有误?
这不是 Amazon CloudWatch 的工作方式。
在 CloudWatch 中创建警报时,您指定:
- A metric(例如 CPU 利用率,或者可能是发送到 CloudWatch 的自定义指标)
- 一个时间段(例如前30分钟)
- 一种聚合方法(例如平均、求和、计数)
例如,如果在过去 30 分钟内超过了指标的平均值,CloudWatch 会触发警报。这是不断评估为滑动window。它不会查看不同的 30 分钟时间段内的指标。
使用您的示例,只要前 30 分钟的指标总和为零,它将连续发送警报。
我想将 CloudWatch 警报配置为:
- 对 ApplicationRequestsTotal 指标的最后 30 分钟求和每 30 分钟一次
- 总和为0时报警
我已将自定义 CloudWatch ApplicationRequestsTotal 指标配置为每 60 秒为我的服务发出一次。
我已将闹钟配置为:
{
"MetricAlarms": [
{
"AlarmName": "radio-silence-alarm",
"AlarmDescription": "Alarm if 0 or less requests are received for 1 consecutive period(s) of 30 minutes.",
"ActionsEnabled": true,
"OKActions": [],
"InsufficientDataActions": [],
"MetricName": "ApplicationRequestsTotal",
"Namespace": "AWS/ElasticBeanstalk",
"Statistic": "Sum",
"Dimensions": [
{
"Name": "EnvironmentName",
"Value": "service-environment"
}
],
"Period": 1800,
"EvaluationPeriods": 1,
"Threshold": 0.0,
"ComparisonOperator": "LessThanOrEqualToThreshold",
"TreatMissingData": "missing"
}
],
"CompositeAlarms": []
}
我设置了很多这样的闹钟,每个闹钟似乎:
- 对最后 30 分钟的 ApplicationRequestsTotal 指标求和一次每分钟
例如,此服务在 8:36a 开始获取 0 ApplicationRequestsTotal,而恰好在 9:06a CloudWatch 触发了警报。
以上时间段的aws cloudwatch describe-alarm-history:
{
"AlarmName": "radio-silence-alarm",
"AlarmType": "MetricAlarm",
"Timestamp": "2021-09-29T09:06:37.929000+00:00",
"HistoryItemType": "StateUpdate",
"HistorySummary": "Alarm updated from OK to ALARM",
"HistoryData": "{
"version":"1.0",
"oldState":{
"stateValue":"OK",
"stateReason":"Threshold Crossed: 1 datapoint [42.0 (22/09/21 08:17:00)] was not less than or equal to the threshold (0.0).",
"stateReasonData":{
"version":"1.0",
"queryDate":"2021-09-22T08:47:37.930+0000",
"startDate":"2021-09-22T08:17:00.000+0000",
"statistic":"Sum",
"period":1800,
"recentDatapoints":[
42.0
],
"threshold":0.0,
"evaluatedDatapoints":[
{
"timestamp":"2021-09-22T08:17:00.000+0000",
"sampleCount":30.0,
"value":42.0
}
]
}
},
"newState":{
"stateValue":"ALARM",
"stateReason":"Threshold Crossed: 1 datapoint [0.0 (29/09/21 08:36:00)] was less than or equal to the threshold (0.0).",
"stateReasonData":{
"version":"1.0",
"queryDate":"2021-09-29T09:06:37.926+0000",
"startDate":"2021-09-29T08:36:00.000+0000",
"statistic":"Sum",
"period":1800,
"recentDatapoints":[
0.0
],
"threshold":0.0,
"evaluatedDatapoints":[
{
"timestamp":"2021-09-29T08:36:00.000+0000",
"sampleCount":30.0,
"value":0.0
}
]
}
}
}"
}
我哪里配置有误?
这不是 Amazon CloudWatch 的工作方式。
在 CloudWatch 中创建警报时,您指定:
- A metric(例如 CPU 利用率,或者可能是发送到 CloudWatch 的自定义指标)
- 一个时间段(例如前30分钟)
- 一种聚合方法(例如平均、求和、计数)
例如,如果在过去 30 分钟内超过了指标的平均值,CloudWatch 会触发警报。这是不断评估为滑动window。它不会查看不同的 30 分钟时间段内的指标。
使用您的示例,只要前 30 分钟的指标总和为零,它将连续发送警报。