Boto / Cloudwatch 恢复实例警报
Boto / Cloudwatch recover instance alarm
为了完成这项工作,我一直在用头撞墙。
我正在尝试使用 python/boto 创建一个 cloutwatch 警报来恢复失败的 ec2 实例。
我很难使 ec2:RecoverInstance 操作起作用。我怀疑我的主题设置不正确。
topics = sns_conn.get_all_topics()
topic = topics[u'ListTopicsResponse']['ListTopicsResult']['Topics'][0]['TopicArn']
# arn:aws:sns:us-east-1:*********:CloudWatch
status_check_failed_alarm = boto.ec2.cloudwatch.alarm.MetricAlarm(
connection=cw_conn,
name=_INSTANCE_NAME + "RECOVERY-High-Status-Check-Failed-Any",
metric='StatusCheckFailed',
namespace='AWS/EC2',
statistic='Average',
comparison='>=',
description='status check for %s %s' % (_INSTANCE, _INSTANCE_NAME),
threshold=1.0,
period=60,
evaluation_periods=5,
dimensions={'InstanceId': _INSTANCE},
# alarm_actions = [topic],
ok_actions=[topic],
insufficient_data_actions=[topic])
# status_check_failed_alarm.add_alarm_action('arn:aws:sns:us-east-1:<acct#>:ec2:recover')
# status_check_failed_alarm.add_alarm_action('arn:aws:sns:us-east-1:<acct#>:ec2:RecoverInstances')
status_check_failed_alarm.add_alarm_action('ec2:RecoverInstances')
cw_conn.put_metric_alarm(status_check_failed_alarm)
如有任何建议,我们将不胜感激。
谢谢。
--麦克
我认为问题是这些警报操作在 arn
中没有 <acct>
。 cli reference 记录了有效的 arn
s:
Valid Values: arn:aws:automate:region:ec2:stop | arn:aws:automate:region:ec2:terminate | arn:aws:automate:region:ec2:recover
我认为从 AWS 中提取指标并从中创建警报比尝试从头开始构建它更容易,例如(未经测试的代码):
topics = sns_conn.get_all_topics()
topic = topics[u'ListTopicsResponse']['ListTopicsResult']['Topics'][0]['TopicArn']
metric = cloudwatch_conn.list_metrics(dimensions={'InstanceId': _INSTANCE},
metric_name="StatusCheckFailed")[0]
alarm = metric.create_alarm(name=_INSTANCE_NAME + "RECOVERY-High-Status-Check-Failed-Any",
description='status check for {} {}'.format(_INSTANCE, _INSTANCE_NAME),
alarm_actions=[topic, 'arn:aws:automate:us-east-1:ec2:recover'],
ok_actions=[topic],
insufficient_data_actions=[topic],
statistic='Average',
comparison='>=',
threshold=1.0,
period=60,
evaluation_periods=5)
为了完成这项工作,我一直在用头撞墙。 我正在尝试使用 python/boto 创建一个 cloutwatch 警报来恢复失败的 ec2 实例。 我很难使 ec2:RecoverInstance 操作起作用。我怀疑我的主题设置不正确。
topics = sns_conn.get_all_topics()
topic = topics[u'ListTopicsResponse']['ListTopicsResult']['Topics'][0]['TopicArn']
# arn:aws:sns:us-east-1:*********:CloudWatch
status_check_failed_alarm = boto.ec2.cloudwatch.alarm.MetricAlarm(
connection=cw_conn,
name=_INSTANCE_NAME + "RECOVERY-High-Status-Check-Failed-Any",
metric='StatusCheckFailed',
namespace='AWS/EC2',
statistic='Average',
comparison='>=',
description='status check for %s %s' % (_INSTANCE, _INSTANCE_NAME),
threshold=1.0,
period=60,
evaluation_periods=5,
dimensions={'InstanceId': _INSTANCE},
# alarm_actions = [topic],
ok_actions=[topic],
insufficient_data_actions=[topic])
# status_check_failed_alarm.add_alarm_action('arn:aws:sns:us-east-1:<acct#>:ec2:recover')
# status_check_failed_alarm.add_alarm_action('arn:aws:sns:us-east-1:<acct#>:ec2:RecoverInstances')
status_check_failed_alarm.add_alarm_action('ec2:RecoverInstances')
cw_conn.put_metric_alarm(status_check_failed_alarm)
如有任何建议,我们将不胜感激。
谢谢。
--麦克
我认为问题是这些警报操作在 arn
中没有 <acct>
。 cli reference 记录了有效的 arn
s:
Valid Values: arn:aws:automate:region:ec2:stop | arn:aws:automate:region:ec2:terminate | arn:aws:automate:region:ec2:recover
我认为从 AWS 中提取指标并从中创建警报比尝试从头开始构建它更容易,例如(未经测试的代码):
topics = sns_conn.get_all_topics()
topic = topics[u'ListTopicsResponse']['ListTopicsResult']['Topics'][0]['TopicArn']
metric = cloudwatch_conn.list_metrics(dimensions={'InstanceId': _INSTANCE},
metric_name="StatusCheckFailed")[0]
alarm = metric.create_alarm(name=_INSTANCE_NAME + "RECOVERY-High-Status-Check-Failed-Any",
description='status check for {} {}'.format(_INSTANCE, _INSTANCE_NAME),
alarm_actions=[topic, 'arn:aws:automate:us-east-1:ec2:recover'],
ok_actions=[topic],
insufficient_data_actions=[topic],
statistic='Average',
comparison='>=',
threshold=1.0,
period=60,
evaluation_periods=5)