AWS Auto Scaling Group 未从 ELB 检测到实例不健康

AWS Auto Scaling Group does not detect instance is unhealthy from ELB

我正在尝试让 AWS Auto Scaling Group 替换“不健康”的实例,但我无法让它工作。

我从控制台创建了一个启动配置,并从那里创建了一个带有 Application Load Balancer 的 Auto Scaling 组。我将有关目标组和听众的所有设置都保留为默认设置。我选择“ELB”作为 Auto Scaling 组的附加健康检查类型。我故意错误配置了启动配置,导致实例“损坏”——没有网络服务器来监听监听器中配置的端口。

Auto Scaling 组 似乎 配置正确并且肯定知道负载均衡器。然而,它认为它启动的实例是健康的。

// output of aws autoscaling describe-auto-scaling-groups:

{
    "AutoScalingGroups": [
        {
            "AutoScalingGroupName": "MyAutoScalingGroup",
            "AutoScalingGroupARN": "arn:aws:autoscaling:eu-west-1:<accountId>:autoScalingGroup:3edc728f-0831-46b9-bbcc-16691adc8f44:autoScalingGroupName/MyAutoScalingGroup",
            "LaunchConfigurationName": "MyLaunchConfiguration",
            "MinSize": 1,
            "MaxSize": 3,
            "DesiredCapacity": 1,
            "DefaultCooldown": 300,
            "AvailabilityZones": [
                "eu-west-1b",
                "eu-west-1c",
                "eu-west-1a"
            ],
            "LoadBalancerNames": [],
            "TargetGroupARNs": [
                "arn:aws:elasticloadbalancing:eu-west-1:<accountId>:targetgroup/MyAutoScalingGroup-1/1e36c863abaeb6ff"
            ],
            "HealthCheckType": "ELB",
            "HealthCheckGracePeriod": 300,
            "Instances": [
                {
                    "InstanceId": "i-0b589d33100e4e515",
                    // ...
                    "LifecycleState": "InService",
                    "HealthStatus": "Healthy",
                    // ...
                }
            ],
            // ...
        }
    ]
}

但是,负载均衡器非常清楚该实例 不健康:

// output of aws elbv2 describe-target-health:

{
    "TargetHealthDescriptions": [
        {
            "Target": {
                "Id": "i-0b589d33100e4e515",
                "Port": 80
            },
            "HealthCheckPort": "80",
            "TargetHealth": {
                "State": "unhealthy",
                "Reason": "Target.Timeout",
                "Description": "Request timed out"
            }
        }
    ]
}

我是不是误解了documentation?如果不是,还需要做些什么才能让 Auto Scaling Group 了解此实例不健康并刷新它?

需要说明的是,当手动将实例标记为不健康时(即使用 aws autoscaling set-instance-health),它们会按预期进行刷新。

说明

如果您从一开始就故意错误配置了实例,并且 ELB 健康检查从未通过,那么 Auto Scaling 组尚未确认您的 ELB/Target 组已启动并且 运行。请参阅文档的此 page

After at least one registered instance passes the health checks, it enters the InService state.

If no registered instances pass the health checks (for example, due to a misconfigured health check), ... Amazon EC2 Auto Scaling doesn't terminate and replace the instances.

我从头开始配置并达到了与您描述的相同的行为。要验证这确实是根本原因,请检查 ASG 中的目标组状态。它可能处于 Added 状态而不是 InService.

[cloudshell-user@ip-10-0-xx-xx ~]$ aws autoscaling describe-load-balancer-target-groups --auto-scaling-group-name test-asg
{
    "LoadBalancerTargetGroups": [
        {
            "LoadBalancerTargetGroupARN": "arn:aws:elasticloadbalancing:us-east-1:xxx:targetgroup/asg-test-1/abc",
            "State": "Added"
        }

分辨率

为了达到预期的效果,我所做的是

  1. 运行 端口 80 上的简单 Web 服务。确保安全组已打开,以便 ELB 与 EC2 通信。
  2. 等到 ELB 状态正常。确保服务器返回 200。您可能需要创建一个空的 index.html 才能通过健康检查。
  3. 等到目标组在 ASG 中的状态变为 InService

例如,对于第 3 步:

[cloudshell-user@ip-10-0-xx-xx ~]$ aws autoscaling describe-load-balancer-target-groups --auto-scaling-group-name test-asg
{
    "LoadBalancerTargetGroups": [
        {
            "LoadBalancerTargetGroupARN": "arn:aws:elasticloadbalancing:us-east-1:xxx:targetgroup/test-asg-1-alb/abcdef",
            "State": "InService"
        }
    ]
}

既然已经投入使用,请关闭网络服务器并等待。不过,请经常检查,因为一旦 ASG 检测到它不健康,它就会终止。

[cloudshell-user@ip-10-0-xx-xx ~]$ aws autoscaling describe-auto-scaling-groups
{
    "AutoScalingGroups": [
        {
            "AutoScalingGroupName": "test-asg",
            "AutoScalingGroupARN": "arn:aws:autoscaling:us-east-1:xxx:autoScalingGroup:abc-def-ghi:autoScalingGroupName/test-asg",
            ...
            "LoadBalancerNames": [],
            "TargetGroupARNs": [
                "arn:aws:elasticloadbalancing:us-east-1:xxx:targetgroup/test-asg-1-alb/abc"
            ],
            "HealthCheckType": "ELB",
            "HealthCheckGracePeriod": 300,
            "Instances": [
                {
                    "InstanceId": "i-04bed6ef3b2000326",
                    "InstanceType": "t2.micro",
                    "AvailabilityZone": "us-east-1b",
                    "LifecycleState": "Terminating",
                    "HealthStatus": "Unhealthy",
                    "LaunchTemplate": {
                        "LaunchTemplateId": "lt-0452c90319362cbc5",
                        "LaunchTemplateName": "test-template",
                        "Version": "1"
                    },
             ...
        },
    ...
    ]
}