Kubernetes 监控指标服务器未启动
Kubernetes monitoring metrics server doesn't start
我有一个 kubeadm
具有一个主节点和一个工作节点的 Kubernetes 集群。
我正在尝试安装 Kubernetes 指标服务器,但不会收集任何信息。指标服务器内的消息是:
17:11:08.680724 1 serving.go:341] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
I1122 17:11:09.439494 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I1122 17:11:09.439529 1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I1122 17:11:09.439559 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I1122 17:11:09.439574 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I1122 17:11:09.439585 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I1122 17:11:09.439589 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I1122 17:11:09.439880 1 dynamic_serving_content.go:130] Starting serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key
I1122 17:11:09.440065 1 secure_serving.go:197] Serving securely on [::]:4443
I1122 17:11:09.440599 1 tlsconfig.go:240] Starting DynamicServingCertificateController
I1122 17:11:09.540590 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I1122 17:11:09.540672 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I1122 17:11:09.540722 1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController
如果我正在尝试 kubectl top node
那么我会遇到以下问题:
W1122 19:36:25.770078 4684 top_node.go:119] Using json format to get metrics. Next release will switch to protocol-buffers, switch early by passing --use-protocol-buffers flag
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
这些是我的 kubernetes 文件:
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
ports:
- name: https
port: 443
protocol: TCP
targetPort: https
selector:
k8s-app: metrics-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: metrics-server
strategy:
rollingUpdate:
maxUnavailable: 0
template:
metadata:
labels:
k8s-app: metrics-server
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --kubelet-insecure-tls=true
- --metric-resolution=15s
image: k8s.gcr.io/metrics-server/metrics-server:v0.5.0
imagePullPolicy: IfNotPresent
name: metrics-server
securityContext:
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
volumeMounts:
- mountPath: /tmp
name: tmp-dir
nodeSelector:
kubernetes.io/os: linux
priorityClassName: system-cluster-critical
serviceAccountName: metrics-server
volumes:
- emptyDir: {}
name: tmp-dir
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
labels:
k8s-app: metrics-server
name: v1beta1.metrics.k8s.io
spec:
group: metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: metrics-server
namespace: kube-system
version: v1beta1
versionPriority: 100
问题是指标服务器未安排在主节点上。因此,我向 Metrics Server Deployment 添加了一个 toleration 和 Node scheduler:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-role.kubernetes.io/master
operator: In
values:
- ""
tolerations:
- key: node-role.kubernetes.io/master
operator: Equal
effect: NoSchedule
value: ""
- key: node.kubernetes.io/disk-pressure
operator: Equal
effect: NoSchedule
value: ""
我有一个 kubeadm
具有一个主节点和一个工作节点的 Kubernetes 集群。
我正在尝试安装 Kubernetes 指标服务器,但不会收集任何信息。指标服务器内的消息是:
17:11:08.680724 1 serving.go:341] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
I1122 17:11:09.439494 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I1122 17:11:09.439529 1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I1122 17:11:09.439559 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I1122 17:11:09.439574 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I1122 17:11:09.439585 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I1122 17:11:09.439589 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I1122 17:11:09.439880 1 dynamic_serving_content.go:130] Starting serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key
I1122 17:11:09.440065 1 secure_serving.go:197] Serving securely on [::]:4443
I1122 17:11:09.440599 1 tlsconfig.go:240] Starting DynamicServingCertificateController
I1122 17:11:09.540590 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I1122 17:11:09.540672 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I1122 17:11:09.540722 1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController
如果我正在尝试 kubectl top node
那么我会遇到以下问题:
W1122 19:36:25.770078 4684 top_node.go:119] Using json format to get metrics. Next release will switch to protocol-buffers, switch early by passing --use-protocol-buffers flag
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
这些是我的 kubernetes 文件:
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
ports:
- name: https
port: 443
protocol: TCP
targetPort: https
selector:
k8s-app: metrics-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: metrics-server
strategy:
rollingUpdate:
maxUnavailable: 0
template:
metadata:
labels:
k8s-app: metrics-server
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --kubelet-insecure-tls=true
- --metric-resolution=15s
image: k8s.gcr.io/metrics-server/metrics-server:v0.5.0
imagePullPolicy: IfNotPresent
name: metrics-server
securityContext:
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
volumeMounts:
- mountPath: /tmp
name: tmp-dir
nodeSelector:
kubernetes.io/os: linux
priorityClassName: system-cluster-critical
serviceAccountName: metrics-server
volumes:
- emptyDir: {}
name: tmp-dir
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
labels:
k8s-app: metrics-server
name: v1beta1.metrics.k8s.io
spec:
group: metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: metrics-server
namespace: kube-system
version: v1beta1
versionPriority: 100
问题是指标服务器未安排在主节点上。因此,我向 Metrics Server Deployment 添加了一个 toleration 和 Node scheduler:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-role.kubernetes.io/master
operator: In
values:
- ""
tolerations:
- key: node-role.kubernetes.io/master
operator: Equal
effect: NoSchedule
value: ""
- key: node.kubernetes.io/disk-pressure
operator: Equal
effect: NoSchedule
value: ""