指标服务器:节点没有匹配类型的地址 [InternalIP]

Metrics-Server: Node had no addresses that matched types [InternalIP]

我正在使用 Rancher 2.5.8 来管理我的 Kubernetes 集群。今天,我创建了一个新集群,一切都按预期工作,除了 metrics-server。指标服务器的状态始终是“CrashLoopBackOff”并且日志告诉我以下内容:

E0519 11:46:39.225804       1 server.go:132] unable to fully scrape metrics: [unable to fully scrape metrics from node worker1: unable to fetch metrics from node worker1: unable to extract connection information for node "worker1": node worker1 had no addresses that matched types [InternalIP], unable to fully scrape metrics from node worker2: unable to fetch metrics from node worker2: unable to extract connection information for node "worker2": node worker2 had no addresses that matched types [InternalIP], unable to fully scrape metrics from node worker3: unable to fetch metrics from node worker3: unable to extract connection information for node "worker3": node worker3 had no addresses that matched types [InternalIP], unable to fully scrape metrics from node main1: unable to fetch metrics from node main1: unable to extract connection information for node "main1": node main1 had no addresses that matched types [InternalIP]]
I0519 11:46:39.228205       1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0519 11:46:39.228222       1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I0519 11:46:39.228290       1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0519 11:46:39.228301       1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0519 11:46:39.228310       1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0519 11:46:39.228314       1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0519 11:46:39.229241       1 secure_serving.go:197] Serving securely on [::]:4443
I0519 11:46:39.229280       1 dynamic_serving_content.go:130] Starting serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key
I0519 11:46:39.229302       1 tlsconfig.go:240] Starting DynamicServingCertificateController
I0519 11:46:39.328399       1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file 
I0519 11:46:39.328428       1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file 
I0519 11:46:39.328505       1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController

有没有人知道我该如何解决这个问题,这样 metrics-server 就不会再崩溃了?

这是 kubectl get nodes worker1 -oyaml 的输出:

status:
  addresses:
  - address: worker1
    type: Hostname
  - address: 65.21.<any>.<ip>
    type: ExternalIP

问题出在指标服务器上。

指标服务器配置为使用 kubelet-preferred-address-types=InternalIP 但工作节点没有列出任何 InternalIP:

$ kubectl get nodes worker1 -oyaml:
[...]
status:
  addresses:
  - address: worker1
    type: Hostname
  - address: 65.21.<any>.<ip>
    type: ExternalIP

解决方案是在指标服务器部署 yaml 中设置 --kubelet-preferred-address-types=ExternalIP

但可能更好的解决方案是将其配置为官方指标服务器部署 yaml (source):

- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname

metrics-server configuration docs 中所述:

--kubelet-preferred-address-types - The priority of node address types used when determining an address for connecting to a particular node (default [Hostname,InternalDNS,InternalIP,ExternalDNS,ExternalIP])