通过 Cloud VPN 访问私有 GKE 集群
Accessing a private GKE cluster via Cloud VPN
我们已经使用带有私有和共享网络的 Terraform 设置了一个 GKE 集群:
网络配置:
resource "google_compute_subnetwork" "int_kube02" {
name = "int-kube02"
region = var.region
project = "infrastructure"
network = "projects/infrastructure/global/networks/net-10-23-0-0-16"
ip_cidr_range = "10.23.5.0/24"
secondary_ip_range {
range_name = "pods"
ip_cidr_range = "10.60.0.0/14" # 10.60 - 10.63
}
secondary_ip_range {
range_name = "services"
ip_cidr_range = "10.56.0.0/16"
}
}
集群配置:
resource "google_container_cluster" "gke_kube02" {
name = "kube02"
location = var.region
initial_node_count = var.gke_kube02_num_nodes
network = "projects/ninfrastructure/global/networks/net-10-23-0-0-16"
subnetwork = "projects/infrastructure/regions/europe-west3/subnetworks/int-kube02"
master_authorized_networks_config {
cidr_blocks {
display_name = "admin vpn"
cidr_block = "10.42.255.0/24"
}
cidr_blocks {
display_name = "monitoring server"
cidr_block = "10.42.4.33/32"
}
cidr_blocks {
display_name = "cluster nodes"
cidr_block = "10.23.5.0/24"
}
}
ip_allocation_policy {
cluster_secondary_range_name = "pods"
services_secondary_range_name = "services"
}
private_cluster_config {
enable_private_nodes = true
enable_private_endpoint = true
master_ipv4_cidr_block = "192.168.23.0/28"
}
node_config {
machine_type = "e2-highcpu-2"
tags = ["kube-no-external-ip"]
metadata = {
disable-legacy-endpoints = true
}
oauth_scopes = [
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring",
]
}
}
集群在线,运行正常。如果我连接到其中一个工作节点,我可以使用 curl
:
到达 api
curl -k https://192.168.23.2
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {
},
"status": "Failure",
"message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
"reason": "Forbidden",
"details": {
},
"code": 403
}
我在使用 SSH 端口转发时也看到了一个健康的集群:
❯ k get pods --all-namespaces --insecure-skip-tls-verify=true
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system event-exporter-gke-5479fd58c8-mv24r 2/2 Running 0 4h44m
kube-system fluentbit-gke-ckkwh 2/2 Running 0 4h44m
kube-system fluentbit-gke-lblkz 2/2 Running 0 4h44m
kube-system fluentbit-gke-zglv2 2/2 Running 4 4h44m
kube-system gke-metrics-agent-j72d9 1/1 Running 0 4h44m
kube-system gke-metrics-agent-ttrzk 1/1 Running 0 4h44m
kube-system gke-metrics-agent-wbqgc 1/1 Running 0 4h44m
kube-system kube-dns-697dc8fc8b-rbf5b 4/4 Running 5 4h44m
kube-system kube-dns-697dc8fc8b-vnqb4 4/4 Running 1 4h44m
kube-system kube-dns-autoscaler-844c9d9448-f6sqw 1/1 Running 0 4h44m
kube-system kube-proxy-gke-kube02-default-pool-2bf58182-xgp7 1/1 Running 0 4h43m
kube-system kube-proxy-gke-kube02-default-pool-707f5d51-s4xw 1/1 Running 0 4h43m
kube-system kube-proxy-gke-kube02-default-pool-bd2c130d-c67h 1/1 Running 0 4h43m
kube-system l7-default-backend-6654b9bccb-mw6bp 1/1 Running 0 4h44m
kube-system metrics-server-v0.4.4-857776bc9c-sq9kd 2/2 Running 0 4h43m
kube-system pdcsi-node-5zlb7 2/2 Running 0 4h44m
kube-system pdcsi-node-kn2zb 2/2 Running 0 4h44m
kube-system pdcsi-node-swhp9 2/2 Running 0 4h44m
到目前为止一切顺利。然后我设置 Cloud Router 来宣布 192.168.23.0/28
网络。这是成功的,并使用 BGP 复制到我们的本地站点。 运行 show route 192.168.23.2
显示发布和安装了正确的路由。
当试图从监控服务器 10.42.4.33
到达 API 时,我只是 运行 超时了。 europe-west3
.
中的所有三个,Cloud VPN、Cloud Router 和 Kubernetes 集群 运行
当我尝试对其中一名工作人员执行 ping 操作时,它的工作完全正常,因此一般网络工作正常:
[me@monitoring ~]$ ping 10.23.5.216
PING 10.23.5.216 (10.23.5.216) 56(84) bytes of data.
64 bytes from 10.23.5.216: icmp_seq=1 ttl=63 time=8.21 ms
64 bytes from 10.23.5.216: icmp_seq=2 ttl=63 time=7.70 ms
64 bytes from 10.23.5.216: icmp_seq=3 ttl=63 time=5.41 ms
64 bytes from 10.23.5.216: icmp_seq=4 ttl=63 time=7.98 ms
Googles Documentation 没有找到可能遗漏的内容。据我了解,集群 API 现在应该可以访问了。
可能缺少什么以及为什么 API 无法通过 VPN 访问?
我一直缺少此处记录的对等配置:
https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#cp-on-prem-routing
resource "google_compute_network_peering_routes_config" "peer_kube02" {
peering = google_container_cluster.gke_kube02.private_cluster_config[0].peering_name
project = "infrastructure"
network = "net-10-13-0-0-16"
export_custom_routes = true
import_custom_routes = false
}
我们已经使用带有私有和共享网络的 Terraform 设置了一个 GKE 集群:
网络配置:
resource "google_compute_subnetwork" "int_kube02" {
name = "int-kube02"
region = var.region
project = "infrastructure"
network = "projects/infrastructure/global/networks/net-10-23-0-0-16"
ip_cidr_range = "10.23.5.0/24"
secondary_ip_range {
range_name = "pods"
ip_cidr_range = "10.60.0.0/14" # 10.60 - 10.63
}
secondary_ip_range {
range_name = "services"
ip_cidr_range = "10.56.0.0/16"
}
}
集群配置:
resource "google_container_cluster" "gke_kube02" {
name = "kube02"
location = var.region
initial_node_count = var.gke_kube02_num_nodes
network = "projects/ninfrastructure/global/networks/net-10-23-0-0-16"
subnetwork = "projects/infrastructure/regions/europe-west3/subnetworks/int-kube02"
master_authorized_networks_config {
cidr_blocks {
display_name = "admin vpn"
cidr_block = "10.42.255.0/24"
}
cidr_blocks {
display_name = "monitoring server"
cidr_block = "10.42.4.33/32"
}
cidr_blocks {
display_name = "cluster nodes"
cidr_block = "10.23.5.0/24"
}
}
ip_allocation_policy {
cluster_secondary_range_name = "pods"
services_secondary_range_name = "services"
}
private_cluster_config {
enable_private_nodes = true
enable_private_endpoint = true
master_ipv4_cidr_block = "192.168.23.0/28"
}
node_config {
machine_type = "e2-highcpu-2"
tags = ["kube-no-external-ip"]
metadata = {
disable-legacy-endpoints = true
}
oauth_scopes = [
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring",
]
}
}
集群在线,运行正常。如果我连接到其中一个工作节点,我可以使用 curl
:
curl -k https://192.168.23.2
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {
},
"status": "Failure",
"message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
"reason": "Forbidden",
"details": {
},
"code": 403
}
我在使用 SSH 端口转发时也看到了一个健康的集群:
❯ k get pods --all-namespaces --insecure-skip-tls-verify=true
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system event-exporter-gke-5479fd58c8-mv24r 2/2 Running 0 4h44m
kube-system fluentbit-gke-ckkwh 2/2 Running 0 4h44m
kube-system fluentbit-gke-lblkz 2/2 Running 0 4h44m
kube-system fluentbit-gke-zglv2 2/2 Running 4 4h44m
kube-system gke-metrics-agent-j72d9 1/1 Running 0 4h44m
kube-system gke-metrics-agent-ttrzk 1/1 Running 0 4h44m
kube-system gke-metrics-agent-wbqgc 1/1 Running 0 4h44m
kube-system kube-dns-697dc8fc8b-rbf5b 4/4 Running 5 4h44m
kube-system kube-dns-697dc8fc8b-vnqb4 4/4 Running 1 4h44m
kube-system kube-dns-autoscaler-844c9d9448-f6sqw 1/1 Running 0 4h44m
kube-system kube-proxy-gke-kube02-default-pool-2bf58182-xgp7 1/1 Running 0 4h43m
kube-system kube-proxy-gke-kube02-default-pool-707f5d51-s4xw 1/1 Running 0 4h43m
kube-system kube-proxy-gke-kube02-default-pool-bd2c130d-c67h 1/1 Running 0 4h43m
kube-system l7-default-backend-6654b9bccb-mw6bp 1/1 Running 0 4h44m
kube-system metrics-server-v0.4.4-857776bc9c-sq9kd 2/2 Running 0 4h43m
kube-system pdcsi-node-5zlb7 2/2 Running 0 4h44m
kube-system pdcsi-node-kn2zb 2/2 Running 0 4h44m
kube-system pdcsi-node-swhp9 2/2 Running 0 4h44m
到目前为止一切顺利。然后我设置 Cloud Router 来宣布 192.168.23.0/28
网络。这是成功的,并使用 BGP 复制到我们的本地站点。 运行 show route 192.168.23.2
显示发布和安装了正确的路由。
当试图从监控服务器 10.42.4.33
到达 API 时,我只是 运行 超时了。 europe-west3
.
当我尝试对其中一名工作人员执行 ping 操作时,它的工作完全正常,因此一般网络工作正常:
[me@monitoring ~]$ ping 10.23.5.216
PING 10.23.5.216 (10.23.5.216) 56(84) bytes of data.
64 bytes from 10.23.5.216: icmp_seq=1 ttl=63 time=8.21 ms
64 bytes from 10.23.5.216: icmp_seq=2 ttl=63 time=7.70 ms
64 bytes from 10.23.5.216: icmp_seq=3 ttl=63 time=5.41 ms
64 bytes from 10.23.5.216: icmp_seq=4 ttl=63 time=7.98 ms
Googles Documentation 没有找到可能遗漏的内容。据我了解,集群 API 现在应该可以访问了。
可能缺少什么以及为什么 API 无法通过 VPN 访问?
我一直缺少此处记录的对等配置: https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#cp-on-prem-routing
resource "google_compute_network_peering_routes_config" "peer_kube02" {
peering = google_container_cluster.gke_kube02.private_cluster_config[0].peering_name
project = "infrastructure"
network = "net-10-13-0-0-16"
export_custom_routes = true
import_custom_routes = false
}