Kubernetes 指标服务器不是 运行

Kubernetes metrics server not running

我已经使用 https://github.com/kubernetes-sigs/metrics-server#installation

在 VirtualBox 上的本地 k8s 集群上安装了指标服务器

但是指标服务器 pod 在

metrics-server-844d9574cf-bxdk7      0/1     CrashLoopBackOff   28         12h     10.46.0.1      kubenode02   <none>           <none>

来自广告连播的事件描述

Events:
  Type     Reason          Age                    From                 Message
  ----     ------          ----                   ----                 -------
  Normal   Scheduled       <unknown>                                   Successfully assigned kube-system/metrics-server-844d9574cf-bxdk7 to kubenode02
  Normal   Created         12h (x3 over 12h)      kubelet, kubenode02  Created container metrics-server
  Normal   Started         12h (x3 over 12h)      kubelet, kubenode02  Started container metrics-server
  Normal   Killing         12h (x2 over 12h)      kubelet, kubenode02  Container metrics-server failed liveness probe, will be restarted
  Warning  Unhealthy       12h (x7 over 12h)      kubelet, kubenode02  Liveness probe failed: HTTP probe failed with statuscode: 500
  Warning  Unhealthy       12h (x7 over 12h)      kubelet, kubenode02  Readiness probe failed: HTTP probe failed with statuscode: 500
  Normal   Pulled          12h (x7 over 12h)      kubelet, kubenode02  Container image "k8s.gcr.io/metrics-server/metrics-server:v0.4.0" already present on machine
  Warning  BackOff         12h (x35 over 12h)     kubelet, kubenode02  Back-off restarting failed container
  Normal   SandboxChanged  55m (x22 over 59m)     kubelet, kubenode02  Pod sandbox changed, it will be killed and re-created.
  Normal   Pulled          55m                    kubelet, kubenode02  Container image "k8s.gcr.io/metrics-server/metrics-server:v0.4.0" already present on machine
  Normal   Created         55m                    kubelet, kubenode02  Created container metrics-server
  Normal   Started         55m                    kubelet, kubenode02  Started container metrics-server
  Warning  Unhealthy       29m (x35 over 55m)     kubelet, kubenode02  Liveness probe failed: HTTP probe failed with statuscode: 500
  Warning  BackOff         4m45s (x202 over 54m)  kubelet, kubenode02  Back-off restarting failed container

来自指标部署的日志如下使用 kubectl logs deployment/metrics-server -n kube-system

E1110 12:56:25.249873       1 pathrecorder.go:107] registered "/metrics" from goroutine 1 [running]:
runtime/debug.Stack(0x1942e80, 0xc0006e8db0, 0x1bb58b5)
        /usr/local/go/src/runtime/debug/stack.go:24 +0x9d
k8s.io/apiserver/pkg/server/mux.(*PathRecorderMux).trackCallers(0xc0004f73b0, 0x1bb58b5, 0x8)
        /go/pkg/mod/k8s.io/apiserver@v0.19.2/pkg/server/mux/pathrecorder.go:109 +0x86
k8s.io/apiserver/pkg/server/mux.(*PathRecorderMux).Handle(0xc0004f73b0, 0x1bb58b5, 0x8, 0x1e96f00, 0xc0005dc8d0)
        /go/pkg/mod/k8s.io/apiserver@v0.19.2/pkg/server/mux/pathrecorder.go:173 +0x84
k8s.io/apiserver/pkg/server/routes.MetricsWithReset.Install(0xc0004f73b0)
        /go/pkg/mod/k8s.io/apiserver@v0.19.2/pkg/server/routes/metrics.go:43 +0x5d
k8s.io/apiserver/pkg/server.installAPI(0xc00000a1e0, 0xc00013d8c0)
        /go/pkg/mod/k8s.io/apiserver@v0.19.2/pkg/server/config.go:711 +0x6c
k8s.io/apiserver/pkg/server.completedConfig.New(0xc00013d8c0, 0x1f099c0, 0xc000697090, 0x1bbdb5a, 0xe, 0x1ef29e0, 0x2cef248, 0x0, 0x0, 0x0)
        /go/pkg/mod/k8s.io/apiserver@v0.19.2/pkg/server/config.go:657 +0xb45
sigs.k8s.io/metrics-server/pkg/server.Config.Complete(0xc00013d8c0, 0xc00013cb40, 0xc00013d680, 0xdf8475800, 0xc92a69c00, 0x0, 0x0, 0xdf8475800)
        /go/src/sigs.k8s.io/metrics-server/pkg/server/config.go:52 +0x312
sigs.k8s.io/metrics-server/cmd/metrics-server/app.runCommand(0xc0001140b0, 0xc0000a65a0, 0x0, 0x0)
        /go/src/sigs.k8s.io/metrics-server/cmd/metrics-server/app/start.go:66 +0x157
sigs.k8s.io/metrics-server/cmd/metrics-server/app.NewMetricsServerCommand.func1(0xc000618b00, 0xc0002c3a80, 0x0, 0x4, 0x0, 0x0)
        /go/src/sigs.k8s.io/metrics-server/cmd/metrics-server/app/start.go:37 +0x33
github.com/spf13/cobra.(*Command).execute(0xc000618b00, 0xc000100060, 0x4, 0x4, 0xc000618b00, 0xc000100060)
        /go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:842 +0x453
github.com/spf13/cobra.(*Command).ExecuteC(0xc000618b00, 0xc00012a120, 0x0, 0x0)
        /go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:950 +0x349
github.com/spf13/cobra.(*Command).Execute(...)
        /go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:887
main.main()
        /go/src/sigs.k8s.io/metrics-server/cmd/metrics-server/metrics-server.go:38 +0xae
I1110 12:56:25.384926       1 secure_serving.go:197] Serving securely on [::]:4443
I1110 12:56:25.384972       1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I1110 12:56:25.384979       1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I1110 12:56:25.384996       1 dynamic_serving_content.go:130] Starting serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key
I1110 12:56:25.385018       1 tlsconfig.go:240] Starting DynamicServingCertificateController
I1110 12:56:25.385069       1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I1110 12:56:25.385083       1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I1110 12:56:25.385105       1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I1110 12:56:25.385117       1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
E1110 12:56:25.385521       1 server.go:132] unable to fully scrape metrics: [unable to fully scrape metrics from node kubenode02: unable to fetch metrics from node kubenode02: Get "https://192.168.56.4:10250/stats/summary?only_cpu_and_memory=true": x509: cannot validate certificate for 192.168.56.4 because it doesn't contain any IP SANs, unable to fully scrape metrics from node kubenode01: unable to fetch metrics from node kubenode01: Get "https://192.168.56.3:10250/stats/summary?only_cpu_and_memory=true": x509: cannot validate certificate for 192.168.56.3 because it doesn't contain any IP SANs, unable to fully scrape metrics from node kubemaster: unable to fetch metrics from node kubemaster: Get "https://192.168.56.2:10250/stats/summary?only_cpu_and_memory=true": x509: cannot validate certificate for 192.168.56.2 because it doesn't contain any IP SANs]
I1110 12:56:25.485100       1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController 
I1110 12:56:25.485359       1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file 
I1110 12:56:25.485398       1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file

错误是由于自签名 TLS 证书。因此,将 - --kubelet-insecure-tls 添加到 components.yaml 并将其重新应用于 K8s 集群可以解决问题。

参考:- https://github.com/kubernetes-sigs/metrics-server#configuration

我认为,更好的办法是为节点(工作人员)重新颁发证书并将 IP 添加到 SAN。 猫 w2k.csr.json

{
  "hosts": [
    "w2k",
    "w2k.rezerw.at",
    "172.16.8.113"
  ],
  "CN": "system:node:w2k",
  "key": {
    "algo": "ecdsa",
    "size": 256
  },
  "names": [
    {
      "O": "system:nodes"
    }
  ]
}

和命令:

cat w2k.csr.json|cfssl genkey - | cfssljson -bare w2k

cat w2k.csr| base64

这将输出一个字符串,将其放入新 yaml 文件的 spec.requet 中:

apiVersion: certificates.k8s.io/v1
kind: CertificateSigningRequest
metadata:
  name: worker01
spec:
  request: "LS0tLS1CRUdJ0tLS0tCg=="
  signerName: kubernetes.io/kubelet-serving
  usages:
  - digital signature
  - key encipherment
  - server auth

应用它。

kubectl apply -f w2k.csr.yaml
certificatesigningrequest.certificates.k8s.io/worker01 configured

批准 csr。

kubectl certificate approve w2k
certificatesigningrequest.certificates.k8s.io/w2k approved

获取证书并将其密钥放在/var/lib/kubelet/pki

中的节点上
root@w2k:/var/lib/kubelet/pki# mv w2k-key.pem  kubelet.key
root@w2k:/var/lib/kubelet/pki# mv w2k-cert.pem kubelet.crt

https://kubernetes.io/docs/tasks/tls/managing-tls-in-a-cluster/#create-a-certificate-signing-request-object-to-send-to-the-kubernetes-api