通过 Cloud VPN 访问私有 GKE 集群

Accessing a private GKE cluster via Cloud VPN

我们已经使用带有私有和共享网络的 Terraform 设置了一个 GKE 集群:

网络配置:

resource "google_compute_subnetwork" "int_kube02" {
  name          = "int-kube02"
  region        = var.region
  project       = "infrastructure"
  network       = "projects/infrastructure/global/networks/net-10-23-0-0-16"
  ip_cidr_range = "10.23.5.0/24"
  secondary_ip_range {
    range_name    = "pods"
    ip_cidr_range = "10.60.0.0/14" # 10.60 - 10.63
  }
  secondary_ip_range {
    range_name    = "services"
    ip_cidr_range = "10.56.0.0/16"
  }
}

集群配置:

resource "google_container_cluster" "gke_kube02" {
  name     = "kube02"
  location = var.region

  initial_node_count = var.gke_kube02_num_nodes

  network    = "projects/ninfrastructure/global/networks/net-10-23-0-0-16"
  subnetwork = "projects/infrastructure/regions/europe-west3/subnetworks/int-kube02"

  master_authorized_networks_config {
    cidr_blocks {
      display_name = "admin vpn"
      cidr_block   = "10.42.255.0/24"
    }
    cidr_blocks {
      display_name = "monitoring server"
      cidr_block   = "10.42.4.33/32"
    }
    cidr_blocks {
      display_name = "cluster nodes"
      cidr_block   = "10.23.5.0/24"
    }
  }

  ip_allocation_policy {
    cluster_secondary_range_name  = "pods"
    services_secondary_range_name = "services"
  }

  private_cluster_config {
    enable_private_nodes    = true
    enable_private_endpoint = true
    master_ipv4_cidr_block  = "192.168.23.0/28"


  }

  node_config {
    machine_type = "e2-highcpu-2"

    tags = ["kube-no-external-ip"]
    metadata = {
      disable-legacy-endpoints = true
    }

    oauth_scopes = [
      "https://www.googleapis.com/auth/logging.write",
      "https://www.googleapis.com/auth/monitoring",
    ]
  }
}

集群在线,运行正常。如果我连接到其中一个工作节点,我可以使用 curl:

到达 api
curl -k https://192.168.23.2
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
  "reason": "Forbidden",
  "details": {

  },
  "code": 403
}

我在使用 SSH 端口转发时也看到了一个健康的集群:

❯ k get pods --all-namespaces --insecure-skip-tls-verify=true
NAMESPACE     NAME                                               READY   STATUS    RESTARTS   AGE
kube-system   event-exporter-gke-5479fd58c8-mv24r                2/2     Running   0          4h44m
kube-system   fluentbit-gke-ckkwh                                2/2     Running   0          4h44m
kube-system   fluentbit-gke-lblkz                                2/2     Running   0          4h44m
kube-system   fluentbit-gke-zglv2                                2/2     Running   4          4h44m
kube-system   gke-metrics-agent-j72d9                            1/1     Running   0          4h44m
kube-system   gke-metrics-agent-ttrzk                            1/1     Running   0          4h44m
kube-system   gke-metrics-agent-wbqgc                            1/1     Running   0          4h44m
kube-system   kube-dns-697dc8fc8b-rbf5b                          4/4     Running   5          4h44m
kube-system   kube-dns-697dc8fc8b-vnqb4                          4/4     Running   1          4h44m
kube-system   kube-dns-autoscaler-844c9d9448-f6sqw               1/1     Running   0          4h44m
kube-system   kube-proxy-gke-kube02-default-pool-2bf58182-xgp7   1/1     Running   0          4h43m
kube-system   kube-proxy-gke-kube02-default-pool-707f5d51-s4xw   1/1     Running   0          4h43m
kube-system   kube-proxy-gke-kube02-default-pool-bd2c130d-c67h   1/1     Running   0          4h43m
kube-system   l7-default-backend-6654b9bccb-mw6bp                1/1     Running   0          4h44m
kube-system   metrics-server-v0.4.4-857776bc9c-sq9kd             2/2     Running   0          4h43m
kube-system   pdcsi-node-5zlb7                                   2/2     Running   0          4h44m
kube-system   pdcsi-node-kn2zb                                   2/2     Running   0          4h44m
kube-system   pdcsi-node-swhp9                                   2/2     Running   0          4h44m

到目前为止一切顺利。然后我设置 Cloud Router 来宣布 192.168.23.0/28 网络。这是成功的,并使用 BGP 复制到我们的本地站点。 运行 show route 192.168.23.2 显示发布和安装了正确的路由。

当试图从监控服务器 10.42.4.33 到达 API 时,我只是 运行 超时了。 europe-west3.

中的所有三个,Cloud VPN、Cloud Router 和 Kubernetes 集群 运行

当我尝试对其中一名工作人员执行 ping 操作时,它的工作完全正常,因此一般网络工作正常:

[me@monitoring ~]$ ping 10.23.5.216
PING 10.23.5.216 (10.23.5.216) 56(84) bytes of data.
64 bytes from 10.23.5.216: icmp_seq=1 ttl=63 time=8.21 ms
64 bytes from 10.23.5.216: icmp_seq=2 ttl=63 time=7.70 ms
64 bytes from 10.23.5.216: icmp_seq=3 ttl=63 time=5.41 ms
64 bytes from 10.23.5.216: icmp_seq=4 ttl=63 time=7.98 ms

Googles Documentation 没有找到可能遗漏的内容。据我了解,集群 API 现在应该可以访问了。

可能缺少什么以及为什么 API 无法通过 VPN 访问?

我一直缺少此处记录的对等配置: https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#cp-on-prem-routing

resource "google_compute_network_peering_routes_config" "peer_kube02" {
  peering = google_container_cluster.gke_kube02.private_cluster_config[0].peering_name
  project = "infrastructure"
  network = "net-10-13-0-0-16"

  export_custom_routes = true
  import_custom_routes = false
}