指标服务器在 Kubernetes 集群中不工作

Metrics server not working in Kubernetes cluster

我已经在 ubuntu 18+ 上设置了 kubernetes 集群。它工作正常。现在我已经添加了指标服务器,但它不工作。

# kubectl get apiservices

v1beta1.metrics.k8s.io                 kube-system/metrics-server   False (FailedDiscoveryCheck)   2d1h

    # kubectl describe apiservice v1beta1.metrics.k8s.io


    Message:               failing or missing response from https://10.106.145.77:443/apis/metrics.k8s.io/v1beta1: Get https://10.106.145.77:443/apis/metrics.k8s.io/v1beta1: dial tcp 10.106.145.77:443: connect: connection refused
    Reason:                FailedDiscoveryCheck

我不知道为什么连接被拒绝。任何人都可以帮助我或给我一些提示来解决这个问题。 我在集群中添加了 RBAC,这是问题吗?我已经尝试了很多来自网络的解决方案,但没有人能帮助我。我曾尝试使用 args 和不安全的 TLS 编辑度量服务器的部署 yaml,但没有帮助。

其他详情

# kubectl get all --all-namespaces | grep -i metrics-server

kube-system            pod/metrics-server-7f55d7ccbb-th9w9              1/1     Running   0          21s
kube-system            service/metrics-server              ClusterIP   10.106.145.77    <none>        443/TCP                                         26m
kube-system            deployment.apps/metrics-server              1/1     1            1           25m
kube-system            replicaset.apps/metrics-server-694db48df9              0         0         0       25m
kube-system            replicaset.apps/metrics-server-7f55d7ccbb              1         1         1       21s


# kubectl get -n kube-system deployment metrics-server -o yaml | grep -i args -A 10

 - args:
        - --cert-dir=/tmp
        - --secure-port=4443
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP
        image: k8s.gcr.io/metrics-server-amd64:v0.3.6
        imagePullPolicy: Always
        name: metrics-server
        ports:
        - containerPort: 4443
          hostPort: 4443

Yml 文件:-

# kubectl get -n kube-system deployment metr                                                                                                                     ics-server -o yaml

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      annotations:
        deployment.kubernetes.io/revision: "2"
      creationTimestamp: "2020-01-29T14:49:06Z"
      generation: 2
      labels:
        k8s-app: metrics-server
      name: metrics-server
      namespace: kube-system
      resourceVersion: "951901"
      selfLink: /apis/apps/v1/namespaces/kube-system/deployments/metrics-server
      uid: 54137f75-af0a-45a5-a508-f4c38ee9ea25
    spec:
      progressDeadlineSeconds: 600
      replicas: 1
      revisionHistoryLimit: 10
      selector:
        matchLabels:
          k8s-app: metrics-server
      strategy:
        rollingUpdate:
          maxSurge: 25%
          maxUnavailable: 25%
        type: RollingUpdate
      template:
        metadata:
          creationTimestamp: null
          labels:
            k8s-app: metrics-server
          name: metrics-server
        spec:
          containers:
          - args:
            - --cert-dir=/tmp
            - --secure-port=4443
            - --kubelet-insecure-tls
            - --kubelet-preferred-address-types=InternalIP
            image: k8s.gcr.io/metrics-server-amd64:v0.3.6
            imagePullPolicy: Always
            name: metrics-server
            ports:
            - containerPort: 4443
              hostPort: 4443
              name: main-port
              protocol: TCP
            resources: {}
            securityContext:
              readOnlyRootFilesystem: true
              runAsNonRoot: true
              runAsUser: 1000
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
            volumeMounts:
            - mountPath: /tmp
              name: tmp-dir
          dnsPolicy: ClusterFirst
          hostNetwork: true
          nodeSelector:
            beta.kubernetes.io/os: linux
            kubernetes.io/arch: amd64
          restartPolicy: Always
          schedulerName: default-scheduler
          securityContext: {}
          serviceAccount: metrics-server
          serviceAccountName: metrics-server
          terminationGracePeriodSeconds: 30
          volumes:
          - emptyDir: {}
            name: tmp-dir
    status:
      availableReplicas: 1
      conditions:
      - lastTransitionTime: "2020-01-29T14:49:15Z"
        lastUpdateTime: "2020-01-29T14:49:15Z"
        message: Deployment has minimum availability.
        reason: MinimumReplicasAvailable
        status: "True"
        type: Available
      - lastTransitionTime: "2020-01-29T14:49:06Z"
        lastUpdateTime: "2020-01-29T15:14:26Z"
        message: ReplicaSet "metrics-server-7f55d7ccbb" has successfully progressed.
        reason: NewReplicaSetAvailable
        status: "True"
        type: Progressing
      observedGeneration: 2
      readyReplicas: 1
      replicas: 1
      updatedReplicas: 1

找到 args 部分并试试这个。添加命令和 /metrics 服务器解决了我的问题,同时更新了首选地址类型,然后重新启动 kubelet。

args:
        - --cert-dir=/tmp
        - --secure-port=4443
        command:
        - /metrics-server
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname

遇到与 503 服务不可用错误消息类似的问题。通过进行以下更改设法解决了该问题。

在您的 components.yaml 文件上,确保证书路径正确:

-- cert-dir=/etc/kubernetes/pki

kubectl apply -f components.yaml

(将证书路径更改为该路径而不是默认路径 /tmp。这可能取决于您的设置,因此请尝试找出您的 pki 证书在您的计算机上的位置。我的在 /etc/kubernetes/pki 上)