Terraform 资源重建动态 AWS RDS 实例计数

Terraform resource recreation dynamic AWS RDS instance counts

我有一个关于 AWS RDS 集群和实例创建的问题。

环境

我们最近试验了:

Terraform v0.11.11 provider.aws v1.41.0

背景

正在创建一些 AWS RDS 数据库。我们的任务是在某些环境中(例如暂存)我们可能 运行 比其他环境(例如生产)更少的实例。考虑到这一点并且不希望每个环境都有完全不同的 terraform 文件,我们决定只指定一次数据库资源并使用一个变量来表示在 staging.tfproduction.tf 中设置的实例数文件分别用于实例数。

我们的设置可能还有一个 "quirk",即子网所在的 VPC 未在 terraform 中定义,该 VPC 已通过在 AWS 控制台中手动创建而存在,因此这是作为数据提供者和 RDS 的子网在 terraform 中是特定的——但在某些环境中我们可能有 3 个子网(每个 AZ 1 个),而在其他环境中我们可能只有 2 个子网,这在某种意义上也是动态的。再次为了实现这一点,我们使用如下所示的迭代:

结构

|-/environments
     -/staging
         -staging.tf
     -/production
         -production.tf
|- /resources
     - database.tf

示例环境变量文件

terraform {
  terraform {
  backend "s3" {
    bucket         = "my-bucket-dev"
    key            = "terraform"
    region         = "eu-west-1"
    encrypt        = "true"
    acl            = "private"
    dynamodb_table = "terraform-state-locking"
  }

  version = "~> 0.11.8"
}

provider "aws" {
  access_key          = "${var.access_key}"
  secret_key          = "${var.secret_key}"
  region              = "${var.region}"
  version             = "~> 1.33"
  allowed_account_ids = ["XXX"]
}

module "main" {
  source                                  = "../../resources"
  vpc_name                                = "test"
  test_db_name                    = "terraform-test-db-dev"
  test_db_instance_count          = 1
  test_db_backup_retention_period = 7
  test_db_backup_window           = "00:57-01:27"
  test_db_maintenance_window      = "tue:04:40-tue:05:10"
  test_db_subnet_count            = 2
  test_db_subnet_cidr_blocks      = ["10.2.4.0/24", "10.2.5.0/24"]
}

我们之所以采用这种基于模块的结构来进行环境隔离,主要是由于这些讨论:

我们的问题

初始资源创建工作正常,我们的子网已创建,数据库集群启动。

我们的问题在下次我们随后 运行 一个 terraform planterraform apply(没有对文件进行任何更改)时开始,此时我们会看到有趣的事情,例如:

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement

Terraform will perform the following actions:
module.main.aws_rds_cluster.test_db (new resource required)
id: "terraform-test-db-dev" => (forces new resource)
availability_zones.#: "3" => "1" (forces new resource)
availability_zones.1924028850: "eu-west-1b" => "" (forces new resource)
availability_zones.3953592328: "eu-west-1a" => "eu-west-1a"
availability_zones.94988580: "eu-west-1c" => "" (forces new resource)

module.main.aws_rds_cluster_instance.test_db (new resource required)
id: "terraform-test-db-dev" => (forces new resource)
cluster_identifier: "terraform-test-db-dev" => "${aws_rds_cluster.test_db.id}" (forces new resource)

我们处理这个问题的方式似乎导致 terraform 认为资源已经改变到必须破坏现有资源并创建全新资源的程度。

配置

variable "aws_availability_zones" {
  description = "Run the EC2 Instances in these Availability Zones"
  type        = "list"
  default     = ["eu-west-1a", "eu-west-1b", "eu-west-1c"]
}

variable "test_db_name" {
  description = "Name of the RDS instance, must be unique per region and is provided by the module config"
}

variable "test_db_subnet_count" {
  description = "Number of subnets to create, is provided by the module config"
}

resource "aws_security_group" "test_db_service" {
  name   = "${var.test_db_service_user_name}"
  vpc_id = "${data.aws_vpc.vpc.id}"
}

resource "aws_security_group" "test_db" {
  name   = "${var.test_db_name}"
  vpc_id = "${data.aws_vpc.vpc.id}"
}

resource "aws_security_group_rule" "test_db_ingress_app_server" {
  security_group_id        = "${aws_security_group.test_db.id}"
...
  source_security_group_id = "${aws_security_group.test_db_service.id}"
}

variable "test_db_subnet_cidr_blocks" {
  description = "Cidr block allocated to the subnets"
  type        = "list"
}

resource "aws_subnet" "test_db" {
  count             = "${var.test_db_subnet_count}"
  vpc_id            = "${data.aws_vpc.vpc.id}"
  cidr_block        = "${element(var.test_db_subnet_cidr_blocks, count.index)}"
  availability_zone = "${element(var.aws_availability_zones, count.index)}"
}

resource "aws_db_subnet_group" "test_db" {
  name       = "${var.test_db_name}"
  subnet_ids = ["${aws_subnet.test_db.*.id}"]
}

variable "test_db_backup_retention_period" {
  description = "Number of days to keep the backup, is provided by the module config"
}

variable "test_db_backup_window" {
  description = "Window during which the backup is done, is provided by the module config"
}

variable "test_db_maintenance_window" {
  description = "Window during which the maintenance is done, is provided by the module config"
}

data "aws_secretsmanager_secret" "test_db_master_password" {
  name = "terraform/db/test-db/root-password"
}

data "aws_secretsmanager_secret_version" "test_db_master_password" {
  secret_id = "${data.aws_secretsmanager_secret.test_db_master_password.id}"
}

data "aws_iam_role" "rds-monitoring-role" {
  name = "rds-monitoring-role"
}

resource "aws_rds_cluster" "test_db" {
  cluster_identifier = "${var.test_db_name}"
  engine             = "aurora-mysql"
  engine_version     = "5.7.12"

  # can only request to deploy in AZ's where there is a subnet in the subnet group.
  availability_zones              = "${slice(var.aws_availability_zones, 0, var.test_db_instance_count)}"
  database_name                   = "${var.test_db_schema_name}"
  master_username                 = "root"
  master_password                 = "${data.aws_secretsmanager_secret_version.test_db_master_password.secret_string}"
  preferred_backup_window         = "${var.test_db_backup_window}"
  preferred_maintenance_window    = "${var.test_db_maintenance_window}"
  backup_retention_period         = "${var.test_db_backup_retention_period}"
  db_subnet_group_name            = "${aws_db_subnet_group.test_db.name}"
  storage_encrypted               = true
  kms_key_id                      = "${data.aws_kms_key.kms_rds_key.arn}"
  deletion_protection             = true
  enabled_cloudwatch_logs_exports = ["audit", "error", "general", "slowquery"]
  vpc_security_group_ids          = ["${aws_security_group.test_db.id}"]
  final_snapshot_identifier       = "test-db-final-snapshot"
}

variable "test_db_instance_count" {
  description = "Number of instances to create, is provided by the module config"
}

resource "aws_rds_cluster_instance" "test_db" {
  count                = "${var.test_db_instance_count}"
  identifier           = "${var.test_db_name}"
  cluster_identifier   = "${aws_rds_cluster.test_db.id}"
  availability_zone    = "${element(var.aws_availability_zones, count.index)}"
  instance_class       = "db.t2.small"
  db_subnet_group_name = "${aws_db_subnet_group.test_db.name}"
  monitoring_interval  = 60
  engine               = "aurora-mysql"
  engine_version       = "5.7.12"
  monitoring_role_arn  = "${data.aws_iam_role.rds-monitoring-role.arn}"

  tags {
    Name = "test_db-${count.index}"
  }
}

我的问题是,有没有办法实现这一点,这样 Terraform 就不会尝试重新创建资源(例如,确保集群的可用性区域和实例的 ID 不会在我们每次 运行地形。

事实证明,只需从 aws_rds_clusteraws_rds_cluster_instance 中删除明确的可用性区域定义,这个问题就会消失,到目前为止一切似乎都按预期工作。另见 https://github.com/terraform-providers/terraform-provider-aws/issues/7307#issuecomment-457441633