当 instance_count 在使用 remote-exec provisioner 时大于 2 时 Terraform 卡住了
Terraform stucks when instance_count is more than 2 while using remote-exec provisioner
- 我正在尝试使用 null_resource.
通过 Terraform 的远程执行供应器供应多个 Windows EC2 实例
$ terraform -v
Terraform v0.12.6
provider.aws v2.23.0
provider.null v2.1.2
- 最初,我正在使用三个远程执行配置程序(其中两个涉及重新启动实例),没有 null_resource 和 对于单个实例 ,一切都绝对有效很好。
- 然后我需要增加计数并基于几个链接,最终使用 null_resource。
因此,我已将问题减少到我什至无法 运行 使用 null_resource 为超过 2 Windows 个 EC2 实例提供一个远程执行供应商的地步。
用于重现错误消息的 Terraform 模板:
//VARIABLES
variable "aws_access_key" {
default = "AK"
}
variable "aws_secret_key" {
default = "SAK"
}
variable "instance_count" {
default = "3"
}
variable "username" {
default = "Administrator"
}
variable "admin_password" {
default = "Password"
}
variable "instance_name" {
default = "Testing"
}
variable "vpc_id" {
default = "vpc-id"
}
//PROVIDERS
provider "aws" {
access_key = "${var.aws_access_key}"
secret_key = "${var.aws_secret_key}"
region = "ap-southeast-2"
}
//RESOURCES
resource "aws_instance" "ec2instance" {
count = "${var.instance_count}"
ami = "Windows AMI"
instance_type = "t2.xlarge"
key_name = "ec2_key"
subnet_id = "subnet-id"
vpc_security_group_ids = ["${aws_security_group.ec2instance-sg.id}"]
tags = {
Name = "${var.instance_name}-${count.index}"
}
}
resource "null_resource" "nullresource" {
count = "${var.instance_count}"
connection {
type = "winrm"
host = "${element(aws_instance.ec2instance.*.private_ip, count.index)}"
user = "${var.username}"
password = "${var.admin_password}"
timeout = "10m"
}
provisioner "remote-exec" {
inline = [
"powershell.exe Write-Host Instance_No=${count.index}"
]
}
// provisioner "local-exec" {
// command = "powershell.exe Write-Host Instance_No=${count.index}"
// }
// provisioner "file" {
// source = "testscript"
// destination = "D:/testscript"
// }
}
resource "aws_security_group" "ec2instance-sg" {
name = "${var.instance_name}-sg"
vpc_id = "${var.vpc_id}"
// RDP
ingress {
from_port = 3389
to_port = 3389
protocol = "tcp"
cidr_blocks = ["CIDR"]
}
// WinRM access from the machine running TF to the instance
ingress {
from_port = 5985
to_port = 5985
protocol = "tcp"
cidr_blocks = ["CIDR"]
}
tags = {
Name = "${var.instance_name}-sg"
}
}
//OUTPUTS
output "private_ip" {
value = "${aws_instance.ec2instance.*.private_ip}"
}
观察:
- 对于一个 remote-exec provisioner,如果计数设置为 1 或 2,它工作正常。对于 count 3,所有实例每次都会 运行 是不可预测的。但是,可以肯定的是,Terraform 永远不会完成并且不会显示输出变量。一直显示"null_resource.nullresource[count.index]: Still creating..."
- 对于 local-exec provisioner - 一切正常。测试计数值为 1、2 和 7。
- 对于文件供应器,它在 1、2 和 3 上工作正常,但在 7 上没有完成,但文件已复制到所有 7 个实例上。一直显示"null_resource.nullresource[count.index]: Still creating..."
- 此外,在每次尝试中,remote-exec provisioner 都能够连接到实例,而不管计数的值如何,只是这样,它不会触发内联命令并随机选择跳过它并开始显示 "Still creating..." 消息。
- 我已经被这个问题困扰了很长一段时间。在调试日志中也找不到任何重要信息。我知道不建议将 Terraform 用作配置管理工具,但是,如果实例计数仅为 1(即使没有 null_resource),即使使用复杂的配置脚本,一切也能正常工作,这表明 Terraform 应该很容易实现来处理这样的基本配置要求。
- TF_DEBUG 日志:
- count=2, TF completes successfully and shows Apply complete!.
- count=3, TF runs the remote-exec on all the three instances however does not complete and doesn't not show the outputs variables. Stuck at "Still creating..."
- count=3, TF runs the remote-exec only on two instances and skips on nullresource[1] , does not complete and doesn't not show the outputs variables. Stuck at "Still creating..."
- 任何指点将不胜感激!
更新:最终的诀窍是根据这个 issue comment.
将 Terraform 降级到 v11.14
您可以尝试一些事情:
- 内联
remote-exec
:
resource "aws_instance" "ec2instance" {
count = "${var.instance_count}"
# ...
provisioner "remote-exec" {
connection {
# ...
}
inline = [
# ...
]
}
}
现在您可以参考connection
块中的self
获取实例的私有IP。
- 将
triggers
添加到null_resource
:
resource "null_resource" "nullresource" {
triggers {
host = "${element(aws_instance.ec2instance.*.private_ip, count.index)}" # Rerun when IP changes
version = "${timestamp()}" # ...or rerun every time
}
# ...
}
您可以使用triggers
attribute重新创建null_resource
,从而重新执行remote-exec
。
我在 null_resource 中使用了这个触发器,它非常适合我。当实例数量增加时它也可以工作,并且它在所有 instances.I 上使用 terraform 和 openstack 进行配置。
triggers= {
instance_ids = join(",",openstack_compute_instance_v2.swarm-cluster-hosts[*].id) }
Terraform 0.12.26 为我解决了类似的问题(在部署多个 VM 时使用多个文件供应器)
希望对您有所帮助:
https://github.com/hashicorp/terraform/issues/22006
- 我正在尝试使用 null_resource. 通过 Terraform 的远程执行供应器供应多个 Windows EC2 实例
$ terraform -v
Terraform v0.12.6
provider.aws v2.23.0
provider.null v2.1.2
- 最初,我正在使用三个远程执行配置程序(其中两个涉及重新启动实例),没有 null_resource 和 对于单个实例 ,一切都绝对有效很好。
- 然后我需要增加计数并基于几个链接,最终使用 null_resource。 因此,我已将问题减少到我什至无法 运行 使用 null_resource 为超过 2 Windows 个 EC2 实例提供一个远程执行供应商的地步。
用于重现错误消息的 Terraform 模板:
//VARIABLES
variable "aws_access_key" {
default = "AK"
}
variable "aws_secret_key" {
default = "SAK"
}
variable "instance_count" {
default = "3"
}
variable "username" {
default = "Administrator"
}
variable "admin_password" {
default = "Password"
}
variable "instance_name" {
default = "Testing"
}
variable "vpc_id" {
default = "vpc-id"
}
//PROVIDERS
provider "aws" {
access_key = "${var.aws_access_key}"
secret_key = "${var.aws_secret_key}"
region = "ap-southeast-2"
}
//RESOURCES
resource "aws_instance" "ec2instance" {
count = "${var.instance_count}"
ami = "Windows AMI"
instance_type = "t2.xlarge"
key_name = "ec2_key"
subnet_id = "subnet-id"
vpc_security_group_ids = ["${aws_security_group.ec2instance-sg.id}"]
tags = {
Name = "${var.instance_name}-${count.index}"
}
}
resource "null_resource" "nullresource" {
count = "${var.instance_count}"
connection {
type = "winrm"
host = "${element(aws_instance.ec2instance.*.private_ip, count.index)}"
user = "${var.username}"
password = "${var.admin_password}"
timeout = "10m"
}
provisioner "remote-exec" {
inline = [
"powershell.exe Write-Host Instance_No=${count.index}"
]
}
// provisioner "local-exec" {
// command = "powershell.exe Write-Host Instance_No=${count.index}"
// }
// provisioner "file" {
// source = "testscript"
// destination = "D:/testscript"
// }
}
resource "aws_security_group" "ec2instance-sg" {
name = "${var.instance_name}-sg"
vpc_id = "${var.vpc_id}"
// RDP
ingress {
from_port = 3389
to_port = 3389
protocol = "tcp"
cidr_blocks = ["CIDR"]
}
// WinRM access from the machine running TF to the instance
ingress {
from_port = 5985
to_port = 5985
protocol = "tcp"
cidr_blocks = ["CIDR"]
}
tags = {
Name = "${var.instance_name}-sg"
}
}
//OUTPUTS
output "private_ip" {
value = "${aws_instance.ec2instance.*.private_ip}"
}
观察:
- 对于一个 remote-exec provisioner,如果计数设置为 1 或 2,它工作正常。对于 count 3,所有实例每次都会 运行 是不可预测的。但是,可以肯定的是,Terraform 永远不会完成并且不会显示输出变量。一直显示"null_resource.nullresource[count.index]: Still creating..."
- 对于 local-exec provisioner - 一切正常。测试计数值为 1、2 和 7。
- 对于文件供应器,它在 1、2 和 3 上工作正常,但在 7 上没有完成,但文件已复制到所有 7 个实例上。一直显示"null_resource.nullresource[count.index]: Still creating..."
- 此外,在每次尝试中,remote-exec provisioner 都能够连接到实例,而不管计数的值如何,只是这样,它不会触发内联命令并随机选择跳过它并开始显示 "Still creating..." 消息。
- 我已经被这个问题困扰了很长一段时间。在调试日志中也找不到任何重要信息。我知道不建议将 Terraform 用作配置管理工具,但是,如果实例计数仅为 1(即使没有 null_resource),即使使用复杂的配置脚本,一切也能正常工作,这表明 Terraform 应该很容易实现来处理这样的基本配置要求。
- TF_DEBUG 日志:
- count=2, TF completes successfully and shows Apply complete!.
- count=3, TF runs the remote-exec on all the three instances however does not complete and doesn't not show the outputs variables. Stuck at "Still creating..."
- count=3, TF runs the remote-exec only on two instances and skips on nullresource[1] , does not complete and doesn't not show the outputs variables. Stuck at "Still creating..."
- 任何指点将不胜感激!
更新:最终的诀窍是根据这个 issue comment.
将 Terraform 降级到v11.14
您可以尝试一些事情:
- 内联
remote-exec
:
resource "aws_instance" "ec2instance" {
count = "${var.instance_count}"
# ...
provisioner "remote-exec" {
connection {
# ...
}
inline = [
# ...
]
}
}
现在您可以参考connection
块中的self
获取实例的私有IP。
- 将
triggers
添加到null_resource
:
resource "null_resource" "nullresource" {
triggers {
host = "${element(aws_instance.ec2instance.*.private_ip, count.index)}" # Rerun when IP changes
version = "${timestamp()}" # ...or rerun every time
}
# ...
}
您可以使用triggers
attribute重新创建null_resource
,从而重新执行remote-exec
。
我在 null_resource 中使用了这个触发器,它非常适合我。当实例数量增加时它也可以工作,并且它在所有 instances.I 上使用 terraform 和 openstack 进行配置。
triggers= { instance_ids = join(",",openstack_compute_instance_v2.swarm-cluster-hosts[*].id) }
Terraform 0.12.26 为我解决了类似的问题(在部署多个 VM 时使用多个文件供应器)
希望对您有所帮助: https://github.com/hashicorp/terraform/issues/22006