如何在同时使用私有 IP 创建多个 Google 云 SQL 实例时修复 "An Unknown Error Occurred"?

How to fix "An Unknown Error Occurred" when creating multiple Google Cloud SQL instances with private IP simultaneously?

我们的云后端设置包含 5 个用于 Postgres 实例的云 SQL。我们使用 Terraform 管理我们的基础设施。我们正在使用 public IP 和 Cloud SQL container.

从 GKE 连接它们

为了简化我们的设置,我们希望通过移动到私有 IP 来摆脱代理容器。我尝试遵循 Terraform guide。虽然创建单个实例工作正常,但尝试同时创建 5 个实例以 4 个失败的实例和一个成功的实例告终:

在失败实例的 Google 云控制台中出现的错误是 "An Unknown Error occurred":

以下是重现它的代码。注意 count = 5 行:

resource "google_compute_network" "private_network" {
  provider = "google-beta"

  name = "private-network"
}

resource "google_compute_global_address" "private_ip_address" {
  provider = "google-beta"

  name = "private-ip-address"
  purpose = "VPC_PEERING"
  address_type = "INTERNAL"
  prefix_length = 16
  network = "${google_compute_network.private_network.self_link}"
}

resource "google_service_networking_connection" "private_vpc_connection" {
  provider = "google-beta"

  network = "${google_compute_network.private_network.self_link}"
  service = "servicenetworking.googleapis.com"
  reserved_peering_ranges = ["${google_compute_global_address.private_ip_address.name}"]
}

resource "google_sql_database_instance" "instance" {
  provider = "google-beta"
  count = 5

  name = "private-instance-${count.index}"
  database_version = "POSTGRES_9_6"

  depends_on = [
    "google_service_networking_connection.private_vpc_connection"
  ]

  settings {
    tier = "db-custom-1-3840"
    availability_type = "REGIONAL"
    ip_configuration {
      ipv4_enabled = "false"
      private_network = "${google_compute_network.private_network.self_link}"
    }
  }
}

provider "google-beta" {
  version = "~> 2.5"
  credentials = "credentials.json"
  project = "PROJECT_ID"
  region = "us-central1"
  zone = "us-central1-a"
}

我尝试了几种选择:

找到了一个丑陋但有效的解决方案。还有is a bug in GCP which does not prevent simultaneous creation of instances although it cannot be completed. There is neither documentation about it nor a meaningful error message. It appears in the Terraform Google provider issue tracker

一种替代方法是在实例之间添加依赖关系。这允许他们的创建成功完成。但是,创建每个实例都需要几分钟时间。这累积到许多花费的时间。如果我们在实例创建之间人为地添加 60 秒的延迟,我们就能设法避免失败。备注:

  • 延迟所需的秒数取决于实例层。例如,对于 db-f1-micro,30 秒就足够了。 db-custom-1-3840.
  • 还不够
  • 我不确定 db-custom-1-3840 所需的确切秒数是多少。 30秒不够,60秒够了。

以下是解决该问题的代码示例。它仅显示 2 个实例,因为由于 depends_on 的限制,我无法使用计数功能,并且显示 5 个实例的完整代码会很长。它对 5 个实例工作相同:

resource "google_compute_network" "private_network" {
  provider = "google-beta"

  name = "private-network"
}

resource "google_compute_global_address" "private_ip_address" {
  provider = "google-beta"

  name = "private-ip-address"
  purpose = "VPC_PEERING"
  address_type = "INTERNAL"
  prefix_length = 16
  network = "${google_compute_network.private_network.self_link}"
}

resource "google_service_networking_connection" "private_vpc_connection" {
  provider = "google-beta"

  network = "${google_compute_network.private_network.self_link}"
  service = "servicenetworking.googleapis.com"
  reserved_peering_ranges = ["${google_compute_global_address.private_ip_address.name}"]
}

locals {
  db_instance_creation_delay_factor_seconds = 60
}

resource "null_resource" "delayer_1" {
  depends_on = ["google_service_networking_connection.private_vpc_connection"]

  provisioner "local-exec" {
    command = "echo Gradual DB instance creation && sleep ${local.db_instance_creation_delay_factor_seconds * 0}"
  }
}

resource "google_sql_database_instance" "instance_1" {
  provider = "google-beta"

  name = "private-instance-delayed-1"
  database_version = "POSTGRES_9_6"

  depends_on = [
    "google_service_networking_connection.private_vpc_connection",
    "null_resource.delayer_1"
  ]

  settings {
    tier = "db-custom-1-3840"
    availability_type = "REGIONAL"
    ip_configuration {
      ipv4_enabled = "false"
      private_network = "${google_compute_network.private_network.self_link}"
    }
  }
}

resource "null_resource" "delayer_2" {
  depends_on = ["google_service_networking_connection.private_vpc_connection"]

  provisioner "local-exec" {
    command = "echo Gradual DB instance creation && sleep ${local.db_instance_creation_delay_factor_seconds * 1}"
  }
}

resource "google_sql_database_instance" "instance_2" {
  provider = "google-beta"

  name = "private-instance-delayed-2"
  database_version = "POSTGRES_9_6"

  depends_on = [
    "google_service_networking_connection.private_vpc_connection",
    "null_resource.delayer_2"
  ]

  settings {
    tier = "db-custom-1-3840"
    availability_type = "REGIONAL"
    ip_configuration {
      ipv4_enabled = "false"
      private_network = "${google_compute_network.private_network.self_link}"
    }
  }
}

provider "google-beta" {
  version = "~> 2.5"
  credentials = "credentials.json"
  project = "PROJECT_ID"
  region = "us-central1"
  zone = "us-central1-a"
}

provider "null" {
  version = "~> 1.0"
}

如果有人以稍微不同的情况登陆这里(在专用网络中创建 google_sql_database_instance 会导致“未知错误”):

  1. 手动启动一个 Cloud SQL 实例(这将为该项目启用 servicenetworking.googleapis.com 和其他一些 API)
  2. 运行 你的清单
  3. 终止在步骤 1 中创建的实例。

之后对我有用

¯_(ツ)_/¯

我来到这里的情况略有不同,与@Grigorash Vasilij 相同 (在专用网络中创建 google_sql_database_instance 会导致“未知错误”)。

我正在使用 UI 在私有 VPC 上部署一个 SQL 实例,出于某种原因,这也给我带来了“未知错误”。我终于解决了使用 gcloud 命令的问题(为什么它有效但没有 UI?IDK,也许 UI 与命令不一样)

gcloud --project=[PROJECT_ID] beta sql instances create [INSTANCE_ID]
       --network=[VPC_NETWORK_NAME]
       --no-assign-ip 

follow this for more details