AWS EC2:如何防止实例状态检查的无限循环?

AWS EC2: How to prevent infinite loop for instance status check?

我有以下 python boto3 代码,可能无限 while-loop。通常,几分钟后 while-loop 就会成功。但是,如果 AWS 端出现故障,程序可能会无限期挂起。

我确信这不是最合适的方法。

# credentials stored in ../.aws/credentials
# region stored in ../.aws/config

# builtins
from time import sleep
# plugins
import boto3

# Assign server instance IDs.
cye_production_web_server_2 = 'i-FAKE-ID'

# Setup EC2 client
ec2 = boto3.client('ec2')

# Start the second web server.
start_response = ec2.start_instances(
    InstanceIds=[cye_production_web_server_2, ],
    DryRun=False
)

print(
    'instance id:',
    start_response['StartingInstances'][0]['InstanceId'],
    'is',
    start_response['StartingInstances'][0]['CurrentState']['Name']
)

# Wait until status is 'ok'
status = None
while status != 'ok':
    status_response = ec2.describe_instance_status(
        DryRun=False,
        InstanceIds=[cye_production_web_server_2, ],
    )
    status = status_response['InstanceStatuses'][0]['SystemStatus']['Status']
    sleep(5)    # 5 second throttle

print(status_response)
print('status is', status.capitalize())

您可以尝试在 for 循环中进行,而不是固定的尝试次数。

例如:

MAX_RETRIES = 5

# Try until status is 'ok'
for x in range(MAX_RETRIES):
    status_response = ec2.describe_instance_status(
        DryRun=False,
        InstanceIds=[cye_production_web_server_2, ],
    )
    status = status_response['InstanceStatuses'][0]['SystemStatus']['Status']
    if status != 'ok':
        sleep(5)    # 5 second throttle
    else:
        break

在循环中实现一个计数器并在多次尝试后失败

status = None
counter = 5
while (status != 'ok' and counter > 0):
    status_response = ec2.describe_instance_status(
        DryRun=False,
        InstanceIds=[cye_production_web_server_2, ],
    )
    status = status_response['InstanceStatuses'][0]['SystemStatus']['Status']
    sleep(5)    # 5 second throttle
    counter=counter-1

print(status_response)
print('status is', status.capitalize())

使用超时可能是更好的主意

import time 

systemstatus = False
timeout = time.time() + 60*minute

while systemstatus is not True:
    status = ec2.describe_instance_status( \
                 DryRun = False,
                 InstanceIds = [instance_id]
             )

    if status['InstanceStatuses'][0]['SystemStatus']['Status'] == 'ok':
        systemstatus = True

    if time.time() > timeout:
        break
    else:
        time.sleep(10)