在使用 terraform cloud [aws-provider] 启动 ec2 实例时，既无法执行 user_data 脚本，也无法执行带有连接块的 remote-exec

Question

我已经创建了一个带有网络 acls、安全组、子网等的 aws 基础设施 [代码附在底部]。在免费套餐中。我还与我的 ec2 实例建立了 ssh 连接，我也可以在登录到实例时手动下载包。

但是，由于我想充分利用 Terraform，所以我想在 Terraform 创建实例时预安装一些东西。

我要执行的命令很简单(install jdk, python, docker),

user_data= <<-EOF
#! /bin/bash
    echo "Installing modules..."
    sudo apt-get update
    sudo apt-get install -y openjdk-8-jdk
    sudo apt install -y python2.7 python-pip
    sudo apt install -y docker.io
    sudo systemctl start docker
    sudo systemctl enable docker
    pip install setuptools
    echo "Modules installed via Terraform"
EOF

我的第一个方法是利用 user_data 参数。即使 ec2 实例可以访问互联网，none 指定的模块已安装。然后我使用了 remote-exec 块以及 terraform 提供的 connection 块。但是正如我们之前许多人所经历的那样，terraform 无法与主机建立成功的连接，返回以下消息，

远程执行块

connection {
  type        = "ssh"
  host        = aws_eip.prod_server_public_ip.public_ip //Error: host for provisioner cannot be empty -> https://github.com/hashicorp/terraform-provider-aws/issues/10977
  user        = "ubuntu"
  private_key = "${chomp(tls_private_key.ssh_key_prod.private_key_pem)}"
  timeout     = "1m"
}

provisioner "remote-exec" {
  inline = [
    "echo 'Installing modules...'",
    "sudo apt-get update",
    "sudo apt-get install -y openjdk-8-jdk",
    "sudo apt install -y python2.7 python-pip",
    "sudo apt install -y docker.io",
    "sudo systemctl start docker",
    "sudo systemctl enable docker",
    "pip install setuptools",
    "echo 'Modules installed via Terraform'"
  ]
  on_failure = fail
}

消息 i/o 超时

Connecting to remote host via SSH...
module.virtual_machines.null_resource.install_modules (remote-exec):   Host: 3.137.111.207
module.virtual_machines.null_resource.install_modules (remote-exec):   User: ubuntu
module.virtual_machines.null_resource.install_modules (remote-exec):   Password: false
module.virtual_machines.null_resource.install_modules (remote-exec):   Private key: true
module.virtual_machines.null_resource.install_modules (remote-exec):   Certificate: false
module.virtual_machines.null_resource.install_modules (remote-exec):   SSH Agent: false
module.virtual_machines.null_resource.install_modules (remote-exec):   Checking Host Key: false
module.virtual_machines.null_resource.install_modules (remote-exec):   Target Platform: unix

timeout - last error: dial tcp 52.15.178.40:22: i/o timeout

我能想到的问题的一个根源是，我只允许 2 个特定的 IP 地址通过安全组的入站路由。因此，当 terraform 尝试连接时，它会从未知的 ip 连接到安全组。如果是这样，允许 terraform 连接到我的虚拟机并预安装软件包的 IP 地址是多少？

Terraform code 用于基础设施。

Answer 1

我运行你的代码在我的沙箱环境中，并且 远程执行工作 。我必须对其进行一些更改才能正常工作，甚至运行您的代码（区域、ami、安全组，...）。因此，您可以查看修改后的代码并从那里获取它。但是下面的代码对我来说没有任何问题。

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 3.0"
    }
  }
}



variable "prefix" {
    default="my"
}

# Create virtual private cloud (vpc)
resource "aws_vpc" "vpc_prod" {
  cidr_block = "10.0.0.0/16" #or 10.0.0.0/16
  enable_dns_hostnames = true
  enable_dns_support = true

  tags = {
      Name = "production-private-cloud"
  }
}

# Assign gateway to vp
resource "aws_internet_gateway" "gw" {
  vpc_id = aws_vpc.vpc_prod.id

  tags = {
      Name = "production-igw"
  }
}

# ---------------------------------------- Step 1: Create two subnets ----------------------------------------
data "aws_availability_zones" "available" {
  state = "available"
}

resource "aws_subnet" "subnet_prod" {
  vpc_id            = aws_vpc.vpc_prod.id
  cidr_block        = "10.0.1.0/24"
  availability_zone = "us-east-1a" #data.aws_availability_zones.available.names[0]
  depends_on        = [aws_internet_gateway.gw]

  map_public_ip_on_launch = true

  tags = {
      Name = "main-public-1"
  }
}

resource "aws_subnet" "subnet_prod_id2" {
  vpc_id            = aws_vpc.vpc_prod.id
  cidr_block        = "10.0.2.0/24" //a second subnet can't use the same cidr block as the first subnet
  availability_zone = "us-east-1b" #data.aws_availability_zones.available.names[1]
  depends_on        = [aws_internet_gateway.gw]

  tags = {
        Name = "main-public-2"
    }
}

# ---------------------------------------- Step 2: Create ACL network/ rules ----------------------------------------
resource "aws_network_acl" "production_acl_network" {
  vpc_id = aws_vpc.vpc_prod.id
  subnet_ids = [aws_subnet.subnet_prod.id, aws_subnet.subnet_prod_id2.id] #assign the created subnets to the acl network otherwirse the NACL is assigned to a default subnet

  tags = {
    Name = "production-network-acl"
  }
}

# Create acl rules for the network
# ACL inbound
resource "aws_network_acl_rule" "all_inbound_traffic_acl" {
  network_acl_id = aws_network_acl.production_acl_network.id
  rule_number    = 180
  protocol       = -1
  rule_action    = "allow"
  cidr_block     = "0.0.0.0/0"
  from_port      = 0
  to_port        = 0
}

# ACL outbound
resource "aws_network_acl_rule" "all_outbound_traffic_acl" {
  network_acl_id = aws_network_acl.production_acl_network.id
  egress         = true
  protocol       = -1
  rule_action    = "allow"
  rule_number    = 180
  cidr_block     = "0.0.0.0/0"
  from_port      = 0
  to_port        = 0
}

# ---------------------------------------- Step 3: Create security group/ rules ----------------------------------------
resource "aws_security_group" "sg_prod" {
    name   = "production-security-group"
    vpc_id = aws_vpc.vpc_prod.id
}

# Create first (inbound) security rule to open port 22 for ssh connection request
resource "aws_security_group_rule" "ssh_inbound_rule_prod" {
  type              = "ingress"
  from_port         = 22
  to_port           = 22
  protocol          = "tcp"
  cidr_blocks       = ["0.0.0.0/0"] #aws_vpc.vpc_prod.cidr_block, "0.0.0.0/0"
  security_group_id = aws_security_group.sg_prod.id
  description       = "security rule to open port 22 for ssh connection"
}

# Create fifth (inbound) security rule to allow pings of public ip address of ec2 instance from local machine
resource "aws_security_group_rule" "ping_public_ip_sg_rule" {
  type              = "ingress"
  from_port         = 8
  to_port           = 0
  protocol          = "icmp"
  cidr_blocks       = ["0.0.0.0/0"] #aws_vpc.vpc_prod.cidr_block, "0.0.0.0/0"
  security_group_id = aws_security_group.sg_prod.id
  description       = "allow pinging elastic public ipv4 address of ec2 instance from local machine"
}

#--------------------------------

# Create first (outbound) security rule to open port 80 for HTTP requests (this will help to download packages while connected to vm)
resource "aws_security_group_rule" "http_outbound_rule_prod" {
  type              = "egress"
  from_port         = 80
  to_port           = 80
  protocol          = "tcp"
  cidr_blocks       = ["0.0.0.0/0"] #aws_vpc.vpc_prod.cidr_block, "0.0.0.0/0"
  security_group_id = aws_security_group.sg_prod.id
  description       = "security rule to open port 80 for outbound connection with http from remote server"
}

# Create second (outbound) security rule to open port 443 for HTTPS requests
resource "aws_security_group_rule" "https_outbound_rule_prod" {
  type              = "egress"
  from_port         = 443
  to_port           = 443
  protocol          = "tcp"
  cidr_blocks       = ["0.0.0.0/0"] #aws_vpc.vpc_prod.cidr_block, "0.0.0.0/0"
  security_group_id = aws_security_group.sg_prod.id
  description       = "security rule to open port 443 for outbound connection with https from remote server"
}

# ---------------------------------------- Step 4: SSH key generated for accessing VM ----------------------------------------
resource "tls_private_key" "ssh_key_prod" {
  algorithm = "RSA"
  rsa_bits  = 4096
}

# ---------------------------------------- Step 5: Generate aws_key_pair ----------------------------------------
resource "aws_key_pair" "generated_key_prod" {
  key_name   = "${var.prefix}-server-ssh-key"
  public_key = tls_private_key.ssh_key_prod.public_key_openssh

  tags   = {
    Name = "SSH key pair for production server"
  }
}

# ---------------------------------------- Step 6: Create network interface ----------------------------------------

# Create network interface
resource "aws_network_interface" "network_interface_prod" {
  subnet_id       = aws_subnet.subnet_prod.id
  security_groups = [aws_security_group.sg_prod.id]
  #private_ip      = aws_eip.prod_server_public_ip.private_ip #!!! not sure if this argument is correct !!!
  description     = "Production server network interface"

  tags   = {
    Name = "production-network-interface"
  }
}

# ---------------------------------------- Step 7: Create the Elastic Public IP after having created the network interface ----------------------------------------

resource "aws_eip" "prod_server_public_ip" {
  vpc               = true
  #instance          = aws_instance.production_server.id
  network_interface = aws_network_interface.network_interface_prod.id
  #don't specify both instance and a network_interface id, one of the two!

  depends_on        = [aws_internet_gateway.gw, aws_network_interface.network_interface_prod]
  tags   = {
    Name = "production-elastic-ip"
  }
}

# ---------------------------------------- Step 8: Associate public ip to network interface ----------------------------------------

resource "aws_eip_association" "eip_assoc" {
  #dont use instance, network_interface_id at the same time
  #instance_id   = aws_instance.production_server.id
  allocation_id = aws_eip.prod_server_public_ip.id
  network_interface_id = aws_network_interface.network_interface_prod.id

  depends_on = [aws_eip.prod_server_public_ip, aws_network_interface.network_interface_prod]
}

# ---------------------------------------- Step 9: Create route table with rules ----------------------------------------

resource "aws_route_table" "route_table_prod" {
  vpc_id = aws_vpc.vpc_prod.id
  tags   = {
    Name = "route-table-production-server"
  }
}

/*documentation =>
https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Internet_Gateway.html#Add_IGW_Routing
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-connect-set-up.html?icmpid=docs_ec2_console#ec2-instance-connect-setup-security-group
*/

resource "aws_route" "route_prod_all" {
  route_table_id         = aws_route_table.route_table_prod.id
  destination_cidr_block = "0.0.0.0/0"
  gateway_id             = aws_internet_gateway.gw.id
  depends_on             = [
    aws_route_table.route_table_prod, aws_internet_gateway.gw
  ]
}

# Create main route table association with the two subnets
resource "aws_main_route_table_association" "main-route-table" {
  vpc_id         = aws_vpc.vpc_prod.id
  route_table_id = aws_route_table.route_table_prod.id
}

resource "aws_route_table_association" "main-public-1-a" {
  subnet_id      = aws_subnet.subnet_prod.id
  route_table_id = aws_route_table.route_table_prod.id
}

resource "aws_route_table_association" "main-public-1-b" {
  subnet_id      = aws_subnet.subnet_prod_id2.id
  route_table_id = aws_route_table.route_table_prod.id
}

# ---------------------------------------- Step 10: Create the AWS EC2 instance ----------------------------------------

resource "aws_instance" "production_server" {
  depends_on                  = [aws_eip.prod_server_public_ip, aws_network_interface.network_interface_prod]
  ami                         = "ami-09e67e426f25ce0d7"
  instance_type               = "t2.micro"
  #key_name                    = "MyKeyPair"#aws_key_pair.generated_key_prod.key_name
  key_name                    = aws_key_pair.generated_key_prod.key_name

  network_interface {
    network_interface_id = aws_network_interface.network_interface_prod.id
    device_index         = 0
  }

  ebs_block_device {
    device_name = "/dev/sda1"
    volume_type = "standard"
    volume_size = 8
  }

  connection {
    type        = "ssh"
    host        = aws_eip.prod_server_public_ip.public_ip #Error: host for provisioner cannot be empty -> https://github.com/hashicorp/terraform-provider-aws/issues/10977
    user        = "ubuntu"
    private_key = tls_private_key.ssh_key_prod.private_key_pem
    timeout     = "1m"
  }

  provisioner "remote-exec" {
    inline = [
      "echo 'Installing modules...'",
      "sudo apt-get update",
      "sudo apt-get install -y openjdk-8-jdk",
      "sudo apt install -y python2.7 python-pip",
      "sudo apt install -y docker.io",
      "sudo systemctl start docker",
      "sudo systemctl enable docker",
      "pip install setuptools",
      "echo 'Modules installed via Terraform'"
    ]
    on_failure = fail
  }

  #user_data= <<-EOF
        #! /bin/bash
    #echo "Installing modules..."
    #sudo apt-get update
    #sudo apt-get install -y openjdk-8-jdk
    #sudo apt install -y python2.7 python-pip
    #sudo apt install -y docker.io
    #sudo systemctl start docker
    #sudo systemctl enable docker
    #pip install setuptools
    #echo "Modules installed via Terraform"
    #EOF

  tags   = {
    Name = "production-server"
  }

  volume_tags = {
    Name = "production-volume"
  }
}

在使用 terraform cloud [aws-provider] 启动 ec2 实例时，既无法执行 user_data 脚本，也无法执行带有连接块的 remote-exec

Can't execute neither user_data script, nor remote-exec with connection block while launching ec2 instance with terraform cloud [aws-provider]

amazon-ec2

amazon-web-services

terraform

terraform-provider-aws