Terraform 资源重建动态 AWS RDS 实例计数
Terraform resource recreation dynamic AWS RDS instance counts
我有一个关于 AWS RDS 集群和实例创建的问题。
环境
我们最近试验了:
Terraform v0.11.11
provider.aws v1.41.0
背景
正在创建一些 AWS RDS 数据库。我们的任务是在某些环境中(例如暂存)我们可能 运行 比其他环境(例如生产)更少的实例。考虑到这一点并且不希望每个环境都有完全不同的 terraform 文件,我们决定只指定一次数据库资源并使用一个变量来表示在 staging.tf
和 production.tf
中设置的实例数文件分别用于实例数。
我们的设置可能还有一个 "quirk",即子网所在的 VPC 未在 terraform 中定义,该 VPC 已通过在 AWS 控制台中手动创建而存在,因此这是作为数据提供者和 RDS 的子网在 terraform 中是特定的——但在某些环境中我们可能有 3 个子网(每个 AZ 1 个),而在其他环境中我们可能只有 2 个子网,这在某种意义上也是动态的。再次为了实现这一点,我们使用如下所示的迭代:
结构
|-/environments
-/staging
-staging.tf
-/production
-production.tf
|- /resources
- database.tf
示例环境变量文件
- dev.tf
terraform {
terraform {
backend "s3" {
bucket = "my-bucket-dev"
key = "terraform"
region = "eu-west-1"
encrypt = "true"
acl = "private"
dynamodb_table = "terraform-state-locking"
}
version = "~> 0.11.8"
}
provider "aws" {
access_key = "${var.access_key}"
secret_key = "${var.secret_key}"
region = "${var.region}"
version = "~> 1.33"
allowed_account_ids = ["XXX"]
}
module "main" {
source = "../../resources"
vpc_name = "test"
test_db_name = "terraform-test-db-dev"
test_db_instance_count = 1
test_db_backup_retention_period = 7
test_db_backup_window = "00:57-01:27"
test_db_maintenance_window = "tue:04:40-tue:05:10"
test_db_subnet_count = 2
test_db_subnet_cidr_blocks = ["10.2.4.0/24", "10.2.5.0/24"]
}
我们之所以采用这种基于模块的结构来进行环境隔离,主要是由于这些讨论:
- https://github.com/hashicorp/terraform/issues/18632#issuecomment-412247266
- https://github.com/hashicorp/terraform/issues/13700
- https://www.terraform.io/docs/state/workspaces.html#when-to-use-multiple-workspaces
我们的问题
初始资源创建工作正常,我们的子网已创建,数据库集群启动。
我们的问题在下次我们随后 运行 一个 terraform plan
或 terraform apply
(没有对文件进行任何更改)时开始,此时我们会看到有趣的事情,例如:
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement
Terraform will perform the following actions:
module.main.aws_rds_cluster.test_db (new resource required)
id: "terraform-test-db-dev" => (forces new resource)
availability_zones.#: "3" => "1" (forces new resource)
availability_zones.1924028850: "eu-west-1b" => "" (forces new resource)
availability_zones.3953592328: "eu-west-1a" => "eu-west-1a"
availability_zones.94988580: "eu-west-1c" => "" (forces new resource)
和
module.main.aws_rds_cluster_instance.test_db (new resource required)
id: "terraform-test-db-dev" => (forces new resource)
cluster_identifier: "terraform-test-db-dev" => "${aws_rds_cluster.test_db.id}" (forces new resource)
我们处理这个问题的方式似乎导致 terraform 认为资源已经改变到必须破坏现有资源并创建全新资源的程度。
配置
variable "aws_availability_zones" {
description = "Run the EC2 Instances in these Availability Zones"
type = "list"
default = ["eu-west-1a", "eu-west-1b", "eu-west-1c"]
}
variable "test_db_name" {
description = "Name of the RDS instance, must be unique per region and is provided by the module config"
}
variable "test_db_subnet_count" {
description = "Number of subnets to create, is provided by the module config"
}
resource "aws_security_group" "test_db_service" {
name = "${var.test_db_service_user_name}"
vpc_id = "${data.aws_vpc.vpc.id}"
}
resource "aws_security_group" "test_db" {
name = "${var.test_db_name}"
vpc_id = "${data.aws_vpc.vpc.id}"
}
resource "aws_security_group_rule" "test_db_ingress_app_server" {
security_group_id = "${aws_security_group.test_db.id}"
...
source_security_group_id = "${aws_security_group.test_db_service.id}"
}
variable "test_db_subnet_cidr_blocks" {
description = "Cidr block allocated to the subnets"
type = "list"
}
resource "aws_subnet" "test_db" {
count = "${var.test_db_subnet_count}"
vpc_id = "${data.aws_vpc.vpc.id}"
cidr_block = "${element(var.test_db_subnet_cidr_blocks, count.index)}"
availability_zone = "${element(var.aws_availability_zones, count.index)}"
}
resource "aws_db_subnet_group" "test_db" {
name = "${var.test_db_name}"
subnet_ids = ["${aws_subnet.test_db.*.id}"]
}
variable "test_db_backup_retention_period" {
description = "Number of days to keep the backup, is provided by the module config"
}
variable "test_db_backup_window" {
description = "Window during which the backup is done, is provided by the module config"
}
variable "test_db_maintenance_window" {
description = "Window during which the maintenance is done, is provided by the module config"
}
data "aws_secretsmanager_secret" "test_db_master_password" {
name = "terraform/db/test-db/root-password"
}
data "aws_secretsmanager_secret_version" "test_db_master_password" {
secret_id = "${data.aws_secretsmanager_secret.test_db_master_password.id}"
}
data "aws_iam_role" "rds-monitoring-role" {
name = "rds-monitoring-role"
}
resource "aws_rds_cluster" "test_db" {
cluster_identifier = "${var.test_db_name}"
engine = "aurora-mysql"
engine_version = "5.7.12"
# can only request to deploy in AZ's where there is a subnet in the subnet group.
availability_zones = "${slice(var.aws_availability_zones, 0, var.test_db_instance_count)}"
database_name = "${var.test_db_schema_name}"
master_username = "root"
master_password = "${data.aws_secretsmanager_secret_version.test_db_master_password.secret_string}"
preferred_backup_window = "${var.test_db_backup_window}"
preferred_maintenance_window = "${var.test_db_maintenance_window}"
backup_retention_period = "${var.test_db_backup_retention_period}"
db_subnet_group_name = "${aws_db_subnet_group.test_db.name}"
storage_encrypted = true
kms_key_id = "${data.aws_kms_key.kms_rds_key.arn}"
deletion_protection = true
enabled_cloudwatch_logs_exports = ["audit", "error", "general", "slowquery"]
vpc_security_group_ids = ["${aws_security_group.test_db.id}"]
final_snapshot_identifier = "test-db-final-snapshot"
}
variable "test_db_instance_count" {
description = "Number of instances to create, is provided by the module config"
}
resource "aws_rds_cluster_instance" "test_db" {
count = "${var.test_db_instance_count}"
identifier = "${var.test_db_name}"
cluster_identifier = "${aws_rds_cluster.test_db.id}"
availability_zone = "${element(var.aws_availability_zones, count.index)}"
instance_class = "db.t2.small"
db_subnet_group_name = "${aws_db_subnet_group.test_db.name}"
monitoring_interval = 60
engine = "aurora-mysql"
engine_version = "5.7.12"
monitoring_role_arn = "${data.aws_iam_role.rds-monitoring-role.arn}"
tags {
Name = "test_db-${count.index}"
}
}
我的问题是,有没有办法实现这一点,这样 Terraform 就不会尝试重新创建资源(例如,确保集群的可用性区域和实例的 ID 不会在我们每次 运行地形。
事实证明,只需从 aws_rds_cluster
和 aws_rds_cluster_instance
中删除明确的可用性区域定义,这个问题就会消失,到目前为止一切似乎都按预期工作。另见 https://github.com/terraform-providers/terraform-provider-aws/issues/7307#issuecomment-457441633
我有一个关于 AWS RDS 集群和实例创建的问题。
环境
我们最近试验了:
Terraform v0.11.11 provider.aws v1.41.0
背景
正在创建一些 AWS RDS 数据库。我们的任务是在某些环境中(例如暂存)我们可能 运行 比其他环境(例如生产)更少的实例。考虑到这一点并且不希望每个环境都有完全不同的 terraform 文件,我们决定只指定一次数据库资源并使用一个变量来表示在 staging.tf
和 production.tf
中设置的实例数文件分别用于实例数。
我们的设置可能还有一个 "quirk",即子网所在的 VPC 未在 terraform 中定义,该 VPC 已通过在 AWS 控制台中手动创建而存在,因此这是作为数据提供者和 RDS 的子网在 terraform 中是特定的——但在某些环境中我们可能有 3 个子网(每个 AZ 1 个),而在其他环境中我们可能只有 2 个子网,这在某种意义上也是动态的。再次为了实现这一点,我们使用如下所示的迭代:
结构
|-/environments
-/staging
-staging.tf
-/production
-production.tf
|- /resources
- database.tf
示例环境变量文件
- dev.tf
terraform {
terraform {
backend "s3" {
bucket = "my-bucket-dev"
key = "terraform"
region = "eu-west-1"
encrypt = "true"
acl = "private"
dynamodb_table = "terraform-state-locking"
}
version = "~> 0.11.8"
}
provider "aws" {
access_key = "${var.access_key}"
secret_key = "${var.secret_key}"
region = "${var.region}"
version = "~> 1.33"
allowed_account_ids = ["XXX"]
}
module "main" {
source = "../../resources"
vpc_name = "test"
test_db_name = "terraform-test-db-dev"
test_db_instance_count = 1
test_db_backup_retention_period = 7
test_db_backup_window = "00:57-01:27"
test_db_maintenance_window = "tue:04:40-tue:05:10"
test_db_subnet_count = 2
test_db_subnet_cidr_blocks = ["10.2.4.0/24", "10.2.5.0/24"]
}
我们之所以采用这种基于模块的结构来进行环境隔离,主要是由于这些讨论:
- https://github.com/hashicorp/terraform/issues/18632#issuecomment-412247266
- https://github.com/hashicorp/terraform/issues/13700
- https://www.terraform.io/docs/state/workspaces.html#when-to-use-multiple-workspaces
我们的问题
初始资源创建工作正常,我们的子网已创建,数据库集群启动。
我们的问题在下次我们随后 运行 一个 terraform plan
或 terraform apply
(没有对文件进行任何更改)时开始,此时我们会看到有趣的事情,例如:
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement
Terraform will perform the following actions:
module.main.aws_rds_cluster.test_db (new resource required)
id: "terraform-test-db-dev" => (forces new resource)
availability_zones.#: "3" => "1" (forces new resource)
availability_zones.1924028850: "eu-west-1b" => "" (forces new resource)
availability_zones.3953592328: "eu-west-1a" => "eu-west-1a"
availability_zones.94988580: "eu-west-1c" => "" (forces new resource)
和
module.main.aws_rds_cluster_instance.test_db (new resource required)
id: "terraform-test-db-dev" => (forces new resource)
cluster_identifier: "terraform-test-db-dev" => "${aws_rds_cluster.test_db.id}" (forces new resource)
我们处理这个问题的方式似乎导致 terraform 认为资源已经改变到必须破坏现有资源并创建全新资源的程度。
配置
variable "aws_availability_zones" {
description = "Run the EC2 Instances in these Availability Zones"
type = "list"
default = ["eu-west-1a", "eu-west-1b", "eu-west-1c"]
}
variable "test_db_name" {
description = "Name of the RDS instance, must be unique per region and is provided by the module config"
}
variable "test_db_subnet_count" {
description = "Number of subnets to create, is provided by the module config"
}
resource "aws_security_group" "test_db_service" {
name = "${var.test_db_service_user_name}"
vpc_id = "${data.aws_vpc.vpc.id}"
}
resource "aws_security_group" "test_db" {
name = "${var.test_db_name}"
vpc_id = "${data.aws_vpc.vpc.id}"
}
resource "aws_security_group_rule" "test_db_ingress_app_server" {
security_group_id = "${aws_security_group.test_db.id}"
...
source_security_group_id = "${aws_security_group.test_db_service.id}"
}
variable "test_db_subnet_cidr_blocks" {
description = "Cidr block allocated to the subnets"
type = "list"
}
resource "aws_subnet" "test_db" {
count = "${var.test_db_subnet_count}"
vpc_id = "${data.aws_vpc.vpc.id}"
cidr_block = "${element(var.test_db_subnet_cidr_blocks, count.index)}"
availability_zone = "${element(var.aws_availability_zones, count.index)}"
}
resource "aws_db_subnet_group" "test_db" {
name = "${var.test_db_name}"
subnet_ids = ["${aws_subnet.test_db.*.id}"]
}
variable "test_db_backup_retention_period" {
description = "Number of days to keep the backup, is provided by the module config"
}
variable "test_db_backup_window" {
description = "Window during which the backup is done, is provided by the module config"
}
variable "test_db_maintenance_window" {
description = "Window during which the maintenance is done, is provided by the module config"
}
data "aws_secretsmanager_secret" "test_db_master_password" {
name = "terraform/db/test-db/root-password"
}
data "aws_secretsmanager_secret_version" "test_db_master_password" {
secret_id = "${data.aws_secretsmanager_secret.test_db_master_password.id}"
}
data "aws_iam_role" "rds-monitoring-role" {
name = "rds-monitoring-role"
}
resource "aws_rds_cluster" "test_db" {
cluster_identifier = "${var.test_db_name}"
engine = "aurora-mysql"
engine_version = "5.7.12"
# can only request to deploy in AZ's where there is a subnet in the subnet group.
availability_zones = "${slice(var.aws_availability_zones, 0, var.test_db_instance_count)}"
database_name = "${var.test_db_schema_name}"
master_username = "root"
master_password = "${data.aws_secretsmanager_secret_version.test_db_master_password.secret_string}"
preferred_backup_window = "${var.test_db_backup_window}"
preferred_maintenance_window = "${var.test_db_maintenance_window}"
backup_retention_period = "${var.test_db_backup_retention_period}"
db_subnet_group_name = "${aws_db_subnet_group.test_db.name}"
storage_encrypted = true
kms_key_id = "${data.aws_kms_key.kms_rds_key.arn}"
deletion_protection = true
enabled_cloudwatch_logs_exports = ["audit", "error", "general", "slowquery"]
vpc_security_group_ids = ["${aws_security_group.test_db.id}"]
final_snapshot_identifier = "test-db-final-snapshot"
}
variable "test_db_instance_count" {
description = "Number of instances to create, is provided by the module config"
}
resource "aws_rds_cluster_instance" "test_db" {
count = "${var.test_db_instance_count}"
identifier = "${var.test_db_name}"
cluster_identifier = "${aws_rds_cluster.test_db.id}"
availability_zone = "${element(var.aws_availability_zones, count.index)}"
instance_class = "db.t2.small"
db_subnet_group_name = "${aws_db_subnet_group.test_db.name}"
monitoring_interval = 60
engine = "aurora-mysql"
engine_version = "5.7.12"
monitoring_role_arn = "${data.aws_iam_role.rds-monitoring-role.arn}"
tags {
Name = "test_db-${count.index}"
}
}
我的问题是,有没有办法实现这一点,这样 Terraform 就不会尝试重新创建资源(例如,确保集群的可用性区域和实例的 ID 不会在我们每次 运行地形。
事实证明,只需从 aws_rds_cluster
和 aws_rds_cluster_instance
中删除明确的可用性区域定义,这个问题就会消失,到目前为止一切似乎都按预期工作。另见 https://github.com/terraform-providers/terraform-provider-aws/issues/7307#issuecomment-457441633