GKE 自动驾驶仪上的普罗米修斯?

Prometheus on GKE Autopilot?

目前我在 Prometheus 的 kubernetes-nodes 工作中,端点 /api/v1/nodes/gk3-<cluster name>-default-pool-<something arbitrary>/proxy/metrics 正在被抓取

但问题是,当我在 postman

上手动尝试时,出现了 403 错误 GKEAutopilot authz: cluster scoped resource "nodes/proxy" is managed and access is denied

如何在 GKE Autopilot 上解决这个问题?

虽然 Autopilot 文档没有具体提及节点代理 API,但这是在限制部分:

Most external monitoring tools require access that is restricted. Solutions from several Google Cloud partners are available for use on Autopilot, however not all are supported, and custom monitoring tools cannot be installed on Autopilot clusters.

鉴于端口转发和所有其他节点级访问受到限制,这似乎不可用。目前还不清楚 Autopilot 是否完全使用 Kubelet,他们可能不会告诉你。

年终更新:

这现在基本有效。 Autopilot 增加了对集群范围对象和 webhook 之类的支持。您确实需要重新配置任何安装清单以不触及 kube-system 命名空间,因为它仍然处于锁定状态,但是如果您反复敲打它,您可以完成大部分工作。

Created a firewall to allow ingress traffic to port 10250-10255 (kubelet)
     $ gcloud compute firewall-rules create test-kubelet-ingress --allow tcp:10250-10255 --source-ranges="0.0.0.0/0"
Ran the following to:
### make sure the user can create nodes/proxy
  $  kubectl config view
  $ kubectl get all --all-namespaces
  $ kubectl create clusterrolebinding autopilot-cluster-1 --clusterrole=k8-cluster-1 --user=infosys-khajashaik@premium-cloud-support.com
### checking
   $ kubectl auth can-i create nodes/proxy
#> output
# Warning: resource 'nodes' is not namespace scoped
# yes
  $ curl -k https://{NODE_PUBLIC_IP}:10250/run/kube-system/{POD_NAME}/netd -d "cmd=ls" --header "Authorization: Bearer $TOKEN" --insecure
TOKEN = <auto generated token in local kubeconfig>
NODE_PUBLIC_IP = <the public ip of the node>
POD_NAME = <netd pod name in the node>
So even though the user has permissions in the kube-apiserver, it is denied to create a "nodes/proxy" by kubelet.
If nodes/proxy is removed from the authz, it success creating a proxy
$ curl -k https://35.202.254.215:10250/run/kube-system/netd-ff5vr/netd -d "cmd=ls" --header "Authorization: Bearer $TOKEN" --insecure

GKE Autopilot 似乎拒绝访问“nodes/proxy”。

但似乎可以使用 Kubelet 指标。你可以例如从集群内访问它们:

curl  [Node_Internal_IP]:10255/metrics

我最终直接抓取 Kubelet,而不是通过代理,使用这个抓取配置:

- job_name: kubernetes-nodes
  kubernetes_sd_configs:
   - role: node
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  scheme: https
  metrics_path: /metrics/cadvisor

你需要这个 RBAC ClusterRole:

- apiGroups: [""]
  resources: ["nodes/metrics"]
  verbs: ["get"]

使用上述方法,可以从 GKE Autopilot 集群中的 Kubelet 中抓取容器资源指标。