使用 terraform 的 AWS Autoscaling 配置出错

Error in AWS Autoscaling configuration with terraform

我正在尝试使用 AWS 自动缩放和启动配置设置自动缩放环境。

下面是我的 tfvar 启动配置

config_name = "name"
image_id = "ami-test"
instance_type = "c4.large"
key_name = "EC2-key"
security_groups = ["sg-123456789",
    "sg-123456789099"]
associate_public_ip_address = false
enable_monitoring = true
ebs_optimized = true
root_size = 10
root_volume_type = "standard"
root_encrypted = true
device_name = "/dev/sdf"
ebs_volume = 30
ebs_delete = true
ebs_encrypted = true
ebs_volume_type = "gp2"
iam_instance_profile = "arn:aws:iam::1234567890:instance-profile/EC2ROLE"

这是创建一个没有任何问题的配置,从控制台创建的配置和这个 tfvar 执行几乎相似。

下面是自动缩放组的 tfvars。

scaling_name = "EC2-Scaling"
vpc_zone_identifier = ["subnet-123456789", "subnet-asdfghfjk"]
max_size = 2
min_size = 1
health_check_type = "ELB"
launch_configuration = "name"
termination_policies = ["NewestInstance",
    "OldestLaunchConfiguration"]
enabled_metrics = ["GroupInServiceCapacity",
    "GroupMaxSize",
    "GroupTotalCapacity",
    "GroupTotalInstances",
    "GroupMinSize"]
health_check_grace_period = 300
policy_name = "autoscaling_policy"

在控制台中检查时也显示正常。但是,当缩放组尝试向上旋转实例时,它会抛出如下错误。

Launching a new EC2 instance: i-21358239842. Status Reason: Instance became unhealthy while waiting for instance to be in InService state. Termination Reason: Client.InternalError: Client error on launch

请指出我正在做的事情中的一些错误或者我是否遗漏了什么。

正如评论中指出的,这是资源 class。

resource "aws_launch_configuration" "launch_configuration" {
  name = var.config_name
  image_id = var.image_id
  instance_type = var.instance_type
  key_name = var.key_name
  security_groups = var.security_groups
  associate_public_ip_address = var.associate_public_ip_address
  enable_monitoring = var.enable_monitoring
  ebs_optimized = var.ebs_optimized
  
  root_block_device {
    volume_size = var.root_size
    volume_type = var.root_volume_type
    encrypted = var.root_encrypted
  }
  
  ebs_block_device {
    device_name = var.device_name
    volume_size = var.ebs_volume
    delete_on_termination = var.ebs_delete
    encrypted = var.ebs_encrypted
    volume_type = var.ebs_volume_type
  }
  iam_instance_profile  = var.iam_instance_profile
}


resource "aws_autoscaling_group" "autoscaling" {
  name = var.scaling_name
  vpc_zone_identifier        = var.vpc_zone_identifier  
  max_size = var.max_size
  min_size = var.min_size
  health_check_type = var.health_check_type
  launch_configuration = var.launch_configuration
  termination_policies = var.termination_policies
  enabled_metrics = var.enabled_metrics
  
  instance_refresh {
    strategy = "Rolling"
  }
  
  health_check_grace_period = var.health_check_grace_period
  wait_for_capacity_timeout = 0 ##Skips waiting for capacity and proceeds to create a scaling group
}

resource "aws_autoscaling_policy" "dynamic_scaling" {
  name                   = var.policy_name
  adjustment_type        = "ChangeInCapacity"
  autoscaling_group_name = aws_autoscaling_group.autoscaling.name
  policy_type            = "TargetTrackingScaling"

  target_tracking_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ASGAverageCPUUtilization"
    }
    target_value = 40.0
  }
}

目前我正在考虑使用两种解决方案中的任何一种来解决这个问题。

如@Arun K 所述,设置带有健康检查的 ALB 以将请求转发到自动缩放组或环健康检查

来自 aws_autoscaling_group 的 terraform 手册:

wait_for_capacity_timeout (Default: "10m") A maximum duration that Terraform should wait for ASG instances to be healthy before timing out. (See also Waiting for Capacity below.) Setting this to "0" causes Terraform to skip all Capacity Waiting behavior.

https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/autoscaling_group

我认为它不健康,因为它无法通信,从 ec2 错误判断。 0 秒对于 ec2 实例从初始化到服务的时间来说太短了,检查将在 terraform 中触发“aws_autoscaling_group”资源后进行。如果我是一个网络用户(或健康检查)访问当前正在初始化的 ec2 实例,我会得到 500,而不是 500-but-ec2-will-be-span-up-soon-try-again-in-一分钟。在 resource "aws_autoscaling_group" "autoscaling" 中,尝试给它一个值:

wait_for_capacity_timeout = 300 

我根据你的其他值设置的:

health_check_grace_period = 300

所以这个值意味着它会在 ec2 实例发出服务信号后等待 300 秒,然后再进行健康检查。

感谢@Arunk 谁指出了自动缩放组配置中的错误。

错误的主要原因是

resource "aws_autoscaling_group" "autoscaling" {
..
health_check_type = "ELB"
..

我已经指定健康检查是在 Elastic load balancer 中完成的,但我没有将 autscaling 组分配给 Load balancer 。我所要做的就是在下面创建完整的堆栈。

resource "aws_lb" "example" {
  load_balancer_type = "gateway"
  name               = "example"

  subnet_mapping {
    subnet_id = aws_subnet.example.id
  }
}

resource "aws_lb_target_group" "example" {
  name     = "example"
  port     = 6081
  protocol = "GENEVE"
  vpc_id   = aws_vpc.example.id

  health_check {
    port     = 80
    protocol = "HTTP"
  }
}

resource "aws_lb_listener" "example" {
  load_balancer_arn = aws_lb.example.id

  default_action {
    target_group_arn = aws_lb_target_group.example.id
    type             = "forward"
  }
}
resource "aws_autoscaling_attachment" "asg_attachment_bar" {
  autoscaling_group_name = aws_autoscaling_group.asg.id
  alb_target_group_arn   = aws_alb_target_group.test.arn
}

注意:从 terraform 站点复制的代码。

此设置到位后,我遇到的错误得到解决。