如何使用 Terraform 创建健康的 VPC-Native GKE 集群?
How to create a healthy VPC-Native GKE cluster with Terraform?
通过 Terraform,我正在尝试在单个区域 (europe-north1-b) 中创建一个 VPC-Native GKE 集群,它有一个单独的节点池,GKE 集群和节点池在它们自己的 VPC 中网络。
我的代码如下所示:
resource "google_container_cluster" "gke_cluster" {
description = "GKE Cluster for personal projects"
initial_node_count = 1
location = "europe-north1-b"
name = "prod"
network = google_compute_network.gke.self_link
remove_default_node_pool = true
subnetwork = google_compute_subnetwork.gke.self_link
ip_allocation_policy {
cluster_secondary_range_name = local.cluster_secondary_range_name
services_secondary_range_name = local.services_secondary_range_name
}
}
resource "google_compute_network" "gke" {
auto_create_subnetworks = false
delete_default_routes_on_create = false
description = "Compute Network for GKE nodes"
name = "${terraform.workspace}-gke"
routing_mode = "GLOBAL"
}
resource "google_compute_subnetwork" "gke" {
name = "prod-gke-subnetwork"
ip_cidr_range = "10.255.0.0/16"
region = "europe-north1"
network = google_compute_network.gke.id
secondary_ip_range {
range_name = local.cluster_secondary_range_name
ip_cidr_range = "10.0.0.0/10"
}
secondary_ip_range {
range_name = local.services_secondary_range_name
ip_cidr_range = "10.64.0.0/10"
}
}
locals {
cluster_secondary_range_name = "cluster-secondary-range"
services_secondary_range_name = "services-secondary-range"
}
resource "google_container_node_pool" "gke_node_pool" {
cluster = google_container_cluster.gke_cluster.name
location = "europe-north1-b"
name = terraform.workspace
node_count = 1
node_locations = [
"europe-north1-b"
]
node_config {
disk_size_gb = 100
disk_type = "pd-standard"
image_type = "cos_containerd"
local_ssd_count = 0
machine_type = "g1-small"
preemptible = false
service_account = google_service_account.gke_node_pool.email
}
}
resource "google_service_account" "gke_node_pool" {
account_id = "${terraform.workspace}-node-pool"
description = "The default service account for pods to use in ${terraform.workspace}"
display_name = "GKE Node Pool ${terraform.workspace} Service Account"
}
resource "google_project_iam_member" "gke_node_pool" {
member = "serviceAccount:${google_service_account.gke_node_pool.email}"
role = "roles/viewer"
}
但是,每当我应用此 Terraform 代码时,我都会收到以下错误:
google_container_cluster.gke_cluster: Still creating... [24m30s elapsed]
google_container_cluster.gke_cluster: Still creating... [24m40s elapsed]
╷
│ Error: Error waiting for creating GKE cluster: All cluster resources were brought up, but: component "kube-apiserver" from endpoint "gke-xxxxxxxxxxxxxxxxxxxx-yyyy" is unhealthy.
│
│ with google_container_cluster.gke_cluster,
│ on gke.tf line 1, in resource "google_container_cluster" "gke_cluster":
│ 1: resource "google_container_cluster" "gke_cluster" {
│
╵
然后我的集群被自动删除。
我发现我的 Terraform code/syntax 没有问题,并且搜索了 Google Cloud Logging 以找到更详细的错误消息,但没有成功。
那么,如何使用 Terraform 创建健康的 VPC-Native GKE 集群?
事实证明,问题似乎出在具有较大的子网次要范围上。
如问题所示,我有范围:
10.0.0.0/10
为 cluster_secondary_range
.
10.64.0.0/10
为 services_secondary_range
.
这些 /10
CIDR 每个都覆盖 4194304
个 IP 地址,我认为这些 IP 地址可能太大 Google/GKE 无法处理(?) - 特别是因为所有 GKE 文档都使用 CIDR涵盖更小的集群和服务范围。
我决定缩小这些 CIDR 范围以查看是否有帮助:
10.0.0.0/12
为 cluster_secondary_range
.
10.16.0.0/12
为 services_secondary_range
.
这些 /12
个 CIDR 每个都覆盖 1048576
个 IP 地址。
此更改后我的集群已成功创建:
google_container_cluster.gke_cluster: Creation complete after 5m40s
不确定为什么 Google / GKE 无法为集群和服务处理更大的 CIDR 范围,但 /12
对我来说已经足够好了,可以成功创建集群。
通过 Terraform,我正在尝试在单个区域 (europe-north1-b) 中创建一个 VPC-Native GKE 集群,它有一个单独的节点池,GKE 集群和节点池在它们自己的 VPC 中网络。
我的代码如下所示:
resource "google_container_cluster" "gke_cluster" {
description = "GKE Cluster for personal projects"
initial_node_count = 1
location = "europe-north1-b"
name = "prod"
network = google_compute_network.gke.self_link
remove_default_node_pool = true
subnetwork = google_compute_subnetwork.gke.self_link
ip_allocation_policy {
cluster_secondary_range_name = local.cluster_secondary_range_name
services_secondary_range_name = local.services_secondary_range_name
}
}
resource "google_compute_network" "gke" {
auto_create_subnetworks = false
delete_default_routes_on_create = false
description = "Compute Network for GKE nodes"
name = "${terraform.workspace}-gke"
routing_mode = "GLOBAL"
}
resource "google_compute_subnetwork" "gke" {
name = "prod-gke-subnetwork"
ip_cidr_range = "10.255.0.0/16"
region = "europe-north1"
network = google_compute_network.gke.id
secondary_ip_range {
range_name = local.cluster_secondary_range_name
ip_cidr_range = "10.0.0.0/10"
}
secondary_ip_range {
range_name = local.services_secondary_range_name
ip_cidr_range = "10.64.0.0/10"
}
}
locals {
cluster_secondary_range_name = "cluster-secondary-range"
services_secondary_range_name = "services-secondary-range"
}
resource "google_container_node_pool" "gke_node_pool" {
cluster = google_container_cluster.gke_cluster.name
location = "europe-north1-b"
name = terraform.workspace
node_count = 1
node_locations = [
"europe-north1-b"
]
node_config {
disk_size_gb = 100
disk_type = "pd-standard"
image_type = "cos_containerd"
local_ssd_count = 0
machine_type = "g1-small"
preemptible = false
service_account = google_service_account.gke_node_pool.email
}
}
resource "google_service_account" "gke_node_pool" {
account_id = "${terraform.workspace}-node-pool"
description = "The default service account for pods to use in ${terraform.workspace}"
display_name = "GKE Node Pool ${terraform.workspace} Service Account"
}
resource "google_project_iam_member" "gke_node_pool" {
member = "serviceAccount:${google_service_account.gke_node_pool.email}"
role = "roles/viewer"
}
但是,每当我应用此 Terraform 代码时,我都会收到以下错误:
google_container_cluster.gke_cluster: Still creating... [24m30s elapsed]
google_container_cluster.gke_cluster: Still creating... [24m40s elapsed]
╷
│ Error: Error waiting for creating GKE cluster: All cluster resources were brought up, but: component "kube-apiserver" from endpoint "gke-xxxxxxxxxxxxxxxxxxxx-yyyy" is unhealthy.
│
│ with google_container_cluster.gke_cluster,
│ on gke.tf line 1, in resource "google_container_cluster" "gke_cluster":
│ 1: resource "google_container_cluster" "gke_cluster" {
│
╵
然后我的集群被自动删除。
我发现我的 Terraform code/syntax 没有问题,并且搜索了 Google Cloud Logging 以找到更详细的错误消息,但没有成功。
那么,如何使用 Terraform 创建健康的 VPC-Native GKE 集群?
事实证明,问题似乎出在具有较大的子网次要范围上。
如问题所示,我有范围:
10.0.0.0/10
为cluster_secondary_range
.10.64.0.0/10
为services_secondary_range
.
这些 /10
CIDR 每个都覆盖 4194304
个 IP 地址,我认为这些 IP 地址可能太大 Google/GKE 无法处理(?) - 特别是因为所有 GKE 文档都使用 CIDR涵盖更小的集群和服务范围。
我决定缩小这些 CIDR 范围以查看是否有帮助:
10.0.0.0/12
为cluster_secondary_range
.10.16.0.0/12
为services_secondary_range
.
这些 /12
个 CIDR 每个都覆盖 1048576
个 IP 地址。
此更改后我的集群已成功创建:
google_container_cluster.gke_cluster: Creation complete after 5m40s
不确定为什么 Google / GKE 无法为集群和服务处理更大的 CIDR 范围,但 /12
对我来说已经足够好了,可以成功创建集群。