dev k8s master 显示额外负载并导致无法获得输出 pods
dev k8s master is showing extra load and resulting in to not getting output for getting pods
我的 dev k8s master 显示额外的负载,导致无法获得输出 pods:
admin@ip-172-20-49-150:~$ kubectl get po -n cog-stage
^C
admin@ip-172-20-49-150:~$
admin@ip-172-20-49-150:~$ top
top - 04:36:52 up 2 min, 2 users, load average: 14.39, 4.43, 1.55
Tasks: 140 total, 2 running, 138 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.2 sy, 0.0 ni, 0.0 id, 99.6 wa, 0.0 hi, 0.0 si, 0.2 st
KiB Mem: 3857324 total, 3778024 used, 79300 free, 192 buffers
KiB Swap: 0 total, 0 used, 0 free. 15680 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
32 root 20 0 0 0 0 S 2.4 0.0 0:03.75 kswapd0
1263 root 20 0 97388 19036 0 S 1.3 0.5 0:01.06 kube-controller
1224 root 20 0 28764 11380 0 S 0.7 0.3 0:01.86 etcd
1358 root 20 0 46192 10608 0 S 0.7 0.3 0:00.69 kube-scheduler
1243 root 20 0 372552 343024 0 S 0.6 8.9 0:10.51 etcd
695 root 20 0 889180 52352 0 S 0.4 1.4 0:05.34 dockerd
752 root 20 0 205800 13756 0 S 0.4 0.4 0:00.56 protokube
816 root 20 0 449964 30804 0 S 0.4 0.8 0:02.26 kubelet
1247 root 20 0 3207664 2.856g 0 S 0.4 77.6 0:55.90 kube-apiserver
1279 root 20 0 40848 8900 0 S 0.4 0.2 0:00.46 kube-proxy
1 root 20 0 28788 1940 0 R 0.2 0.1 0:02.06 systemd
157 root 0 -20 0 0 0 S 0.2 0.0 0:00.06 kworker/1:1H
1562 admin 20 0 78320 1092 0 S 0.2 0.0 0:00.04 sshd
1585 admin 20 0 23660 540 0 R 0.2 0.0 0:00.11 top
1758 admin 20 0 33512 320 32 D 0.2 0.0 0:00.04 kubectl
1779 root 20 0 39368 436 0 D 0.2 0.0 0:00.01 docker-containe
请告诉我如何解决此问题!
更新 master 上的 kubelet 日志:
admin@ip-172-20-49-150:~$ journalctl -u kubelet -f
Jan 06 05:41:44 ip-172-20-49-150 kubelet[819]: E0106 05:41:44.454586 819 pod_workers.go:182] Error syncing pod 685c903f9066f69a2e17c802cb043ed6 ("etcd-server-events-ip-172-20-49-150.us-west-1.compute.internal_kube-system(685c903f9066f69a2e17c802cb043ed6)"), skipping: failed to "StartContainer" for "etcd-container" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=etcd-container pod=etcd-server-events-ip-172-20-XX-XXX.us-west-1.compute.internal_kube-system(685c903f906b043ed6)"
Jan 06 05:41:45 ip-172-20-49-150 kubelet[819]: I0106 05:41:45.454266 819 kuberuntime_manager.go:500] Container {Name:kube-controller-manager Image:gcr.io/google_containers/kube-controller-manager:v1.8.4 Command:[/bin/sh -c /usr/local/bin/kube-controller-manager --allocate-node-cidrs=true --attach-detach-reconcile-sync-period=1m0s --cloud-provider=aws --cluster-cidr=100.96.0.0/11 --cluster-name=uw1b.k8s.ops.goldenratstud.io --cluster-signing-cert-file=/srv/kubernetes/ca.crt --cluster-signing-key-file=/srv/kubernetes/ca.key --configure-cloud-routes=true --kubeconfig=/var/lib/kube-controller-manager/kubeconfig --leader-elect=true --root-ca-file=/srv/kubernetes/ca.crt --service-account-private-key-file=/srv/kubernetes/server.key --use-service-account-credentials=true --v=2 2>&1 | /bin/tee -a /var/log/kube-controller-manager.log] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:100 scale:-3} d:{Dec:<nil>} s:100m Format:DecimalSI}]} VolumeMounts:[{Name:etcssl ReadOnly:true MountPath:/etc/ssl SubPath: MountPropagation:<nil>} {Name:etcpkitls ReadOnly:true MountPath:/etc/pki/tls SubPath: MountPropagation:<nil>} {Name:etcpkica-trust ReadOnly:true MountPath:/etc/pki/ca-trust SubPath: MountPropagation:<nil>} {Name:usrsharessl ReadOnly:true MountPath:/usr/share/ssl SubPath: MountPropagation:<nil>} {Name:usrssl ReadOnly:true MountPath:/usr/ssl SubPath: MountPropagation:<nil>} {Name:usrlibssl ReadOnly:true MountPath:/usr/lib/ssl SubPath: MountPropagation:<nil>} {Name:usrlocalopenssl ReadOnly:true MountPath:/usr/local/openssl SubPath: MountPropagation:<nil>} {Name:varssl ReadOnly:true MountPath:/var/ssl SubPath: MountPropagation:<nil>} {Name:etcopenssl ReadOnly:true MountPath:/etc/openssl SubPath: MountPropagation:<nil>} {Name:srvkube ReadOnly:true MountPath:/srv/kubernetes SubPath: MountPropagation:<nil>} {Name:logfile ReadOnly:false MountPath:/var/log/kube-controller-manager.log SubPath: MountPropagation:<nil>} {Name:varlibkcm ReadOnly:true MountPath:/var/lib/kube-controller-manager SubPath: MountPropagation:<nil>}] Live
Jan 06 05:41:45 ip-172-20-49-150 kubelet[819]: nessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/healthz,Port:10252,Host:127.0.0.1,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:15,TimeoutSeconds:15,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:3,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Jan 06 05:41:45 ip-172-20-49-150 kubelet[819]: I0106 05:41:45.454658 819 kuberuntime_manager.go:739] checking backoff for container "kube-controller-manager" in pod "kube-controller-manager-ip-172-20-49-150.us-west-1.compute.internal_kube-system(ef6f03ef0b14d853dd38e4c2a5f426dc)"
Jan 06 05:41:45 ip-172-20-49-150 kubelet[819]: I0106 05:41:45.454781 819 kuberuntime_manager.go:749] Back-off 5m0s restarting failed container=kube-controller-manager pod=kube-controller-manager-ip-172-20-49-150.us-west-1.compute.internal_kube-system(ef6f03ef0b14d853dd38e4c2a5f426dc)
Jan 06 05:41:45 ip-172-20-49-150 kubelet[819]: E0106 05:41:45.454813 819 pod_workers.go:182] Error syncing pod ef6f03ef0b14d853dd38e4c2a5f426dc ("kube-controller-manager-ip-172-20-49-150.us-west-1.compute.internal_kube-system(ef6f03ef0b14d853dd38e4c2a5f426dc)"), skipping: failed to "StartContainer" for "kube-controller-manager" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=kube-controller-manager pod=kube-controller-manager-ip-172-20-49-150.us-west-1.compute.internal_kube-system(ef6f03ef0b14d853dd38e4c2a5f426dc)"
Jan 06 05:41:47 ip-172-20-49-150 kubelet[819]: I0106 05:41:47.432074 819 container.go:471] Failed to update stats for container "/kubepods/burstable/pod2a5faee9437283d8ac7f396d86d07a03/0f62ea06693a7d4aaf6702d8ca370f2d5d2f1f3c4fdeab09aede15ea5311e47c": unable to determine device info for dir: /var/lib/docker/overlay/ce30183e915076727e708ed10b2ada4d55d1fe6d5c989c1cffc3e29cc00dad94: stat failed on /var/lib/docker/overlay/ce30183e915076727e708ed10b2ada4d55d1fe6d5c989c1cffc3e29cc00dad94 with error: no such file or directory, continuing to push stats
看起来您已经启动了带有标记 --insecure-bind-address=127.0.0.1
的 api-服务器。您的主机上的端口 8080 未空闲,因此无法启动。
Serving insecurely on 127.0.0.1:8080 failed to listen on 127.0.0.1:8080: listen tcp 127.0.0.1:8080: bind: address already in use
我用新的 K8s Dev master 节点替换了旧的,但仍然遇到同样的问题,现在当将 k8s master 从 c4.large 垂直缩放到 c4.xlarge 时,它工作正常!
我的 dev k8s master 显示额外的负载,导致无法获得输出 pods:
admin@ip-172-20-49-150:~$ kubectl get po -n cog-stage
^C
admin@ip-172-20-49-150:~$
admin@ip-172-20-49-150:~$ top
top - 04:36:52 up 2 min, 2 users, load average: 14.39, 4.43, 1.55
Tasks: 140 total, 2 running, 138 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.2 sy, 0.0 ni, 0.0 id, 99.6 wa, 0.0 hi, 0.0 si, 0.2 st
KiB Mem: 3857324 total, 3778024 used, 79300 free, 192 buffers
KiB Swap: 0 total, 0 used, 0 free. 15680 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
32 root 20 0 0 0 0 S 2.4 0.0 0:03.75 kswapd0
1263 root 20 0 97388 19036 0 S 1.3 0.5 0:01.06 kube-controller
1224 root 20 0 28764 11380 0 S 0.7 0.3 0:01.86 etcd
1358 root 20 0 46192 10608 0 S 0.7 0.3 0:00.69 kube-scheduler
1243 root 20 0 372552 343024 0 S 0.6 8.9 0:10.51 etcd
695 root 20 0 889180 52352 0 S 0.4 1.4 0:05.34 dockerd
752 root 20 0 205800 13756 0 S 0.4 0.4 0:00.56 protokube
816 root 20 0 449964 30804 0 S 0.4 0.8 0:02.26 kubelet
1247 root 20 0 3207664 2.856g 0 S 0.4 77.6 0:55.90 kube-apiserver
1279 root 20 0 40848 8900 0 S 0.4 0.2 0:00.46 kube-proxy
1 root 20 0 28788 1940 0 R 0.2 0.1 0:02.06 systemd
157 root 0 -20 0 0 0 S 0.2 0.0 0:00.06 kworker/1:1H
1562 admin 20 0 78320 1092 0 S 0.2 0.0 0:00.04 sshd
1585 admin 20 0 23660 540 0 R 0.2 0.0 0:00.11 top
1758 admin 20 0 33512 320 32 D 0.2 0.0 0:00.04 kubectl
1779 root 20 0 39368 436 0 D 0.2 0.0 0:00.01 docker-containe
请告诉我如何解决此问题!
更新 master 上的 kubelet 日志: admin@ip-172-20-49-150:~$ journalctl -u kubelet -f
Jan 06 05:41:44 ip-172-20-49-150 kubelet[819]: E0106 05:41:44.454586 819 pod_workers.go:182] Error syncing pod 685c903f9066f69a2e17c802cb043ed6 ("etcd-server-events-ip-172-20-49-150.us-west-1.compute.internal_kube-system(685c903f9066f69a2e17c802cb043ed6)"), skipping: failed to "StartContainer" for "etcd-container" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=etcd-container pod=etcd-server-events-ip-172-20-XX-XXX.us-west-1.compute.internal_kube-system(685c903f906b043ed6)"
Jan 06 05:41:45 ip-172-20-49-150 kubelet[819]: I0106 05:41:45.454266 819 kuberuntime_manager.go:500] Container {Name:kube-controller-manager Image:gcr.io/google_containers/kube-controller-manager:v1.8.4 Command:[/bin/sh -c /usr/local/bin/kube-controller-manager --allocate-node-cidrs=true --attach-detach-reconcile-sync-period=1m0s --cloud-provider=aws --cluster-cidr=100.96.0.0/11 --cluster-name=uw1b.k8s.ops.goldenratstud.io --cluster-signing-cert-file=/srv/kubernetes/ca.crt --cluster-signing-key-file=/srv/kubernetes/ca.key --configure-cloud-routes=true --kubeconfig=/var/lib/kube-controller-manager/kubeconfig --leader-elect=true --root-ca-file=/srv/kubernetes/ca.crt --service-account-private-key-file=/srv/kubernetes/server.key --use-service-account-credentials=true --v=2 2>&1 | /bin/tee -a /var/log/kube-controller-manager.log] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:100 scale:-3} d:{Dec:<nil>} s:100m Format:DecimalSI}]} VolumeMounts:[{Name:etcssl ReadOnly:true MountPath:/etc/ssl SubPath: MountPropagation:<nil>} {Name:etcpkitls ReadOnly:true MountPath:/etc/pki/tls SubPath: MountPropagation:<nil>} {Name:etcpkica-trust ReadOnly:true MountPath:/etc/pki/ca-trust SubPath: MountPropagation:<nil>} {Name:usrsharessl ReadOnly:true MountPath:/usr/share/ssl SubPath: MountPropagation:<nil>} {Name:usrssl ReadOnly:true MountPath:/usr/ssl SubPath: MountPropagation:<nil>} {Name:usrlibssl ReadOnly:true MountPath:/usr/lib/ssl SubPath: MountPropagation:<nil>} {Name:usrlocalopenssl ReadOnly:true MountPath:/usr/local/openssl SubPath: MountPropagation:<nil>} {Name:varssl ReadOnly:true MountPath:/var/ssl SubPath: MountPropagation:<nil>} {Name:etcopenssl ReadOnly:true MountPath:/etc/openssl SubPath: MountPropagation:<nil>} {Name:srvkube ReadOnly:true MountPath:/srv/kubernetes SubPath: MountPropagation:<nil>} {Name:logfile ReadOnly:false MountPath:/var/log/kube-controller-manager.log SubPath: MountPropagation:<nil>} {Name:varlibkcm ReadOnly:true MountPath:/var/lib/kube-controller-manager SubPath: MountPropagation:<nil>}] Live
Jan 06 05:41:45 ip-172-20-49-150 kubelet[819]: nessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/healthz,Port:10252,Host:127.0.0.1,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:15,TimeoutSeconds:15,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:3,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Jan 06 05:41:45 ip-172-20-49-150 kubelet[819]: I0106 05:41:45.454658 819 kuberuntime_manager.go:739] checking backoff for container "kube-controller-manager" in pod "kube-controller-manager-ip-172-20-49-150.us-west-1.compute.internal_kube-system(ef6f03ef0b14d853dd38e4c2a5f426dc)"
Jan 06 05:41:45 ip-172-20-49-150 kubelet[819]: I0106 05:41:45.454781 819 kuberuntime_manager.go:749] Back-off 5m0s restarting failed container=kube-controller-manager pod=kube-controller-manager-ip-172-20-49-150.us-west-1.compute.internal_kube-system(ef6f03ef0b14d853dd38e4c2a5f426dc)
Jan 06 05:41:45 ip-172-20-49-150 kubelet[819]: E0106 05:41:45.454813 819 pod_workers.go:182] Error syncing pod ef6f03ef0b14d853dd38e4c2a5f426dc ("kube-controller-manager-ip-172-20-49-150.us-west-1.compute.internal_kube-system(ef6f03ef0b14d853dd38e4c2a5f426dc)"), skipping: failed to "StartContainer" for "kube-controller-manager" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=kube-controller-manager pod=kube-controller-manager-ip-172-20-49-150.us-west-1.compute.internal_kube-system(ef6f03ef0b14d853dd38e4c2a5f426dc)"
Jan 06 05:41:47 ip-172-20-49-150 kubelet[819]: I0106 05:41:47.432074 819 container.go:471] Failed to update stats for container "/kubepods/burstable/pod2a5faee9437283d8ac7f396d86d07a03/0f62ea06693a7d4aaf6702d8ca370f2d5d2f1f3c4fdeab09aede15ea5311e47c": unable to determine device info for dir: /var/lib/docker/overlay/ce30183e915076727e708ed10b2ada4d55d1fe6d5c989c1cffc3e29cc00dad94: stat failed on /var/lib/docker/overlay/ce30183e915076727e708ed10b2ada4d55d1fe6d5c989c1cffc3e29cc00dad94 with error: no such file or directory, continuing to push stats
看起来您已经启动了带有标记 --insecure-bind-address=127.0.0.1
的 api-服务器。您的主机上的端口 8080 未空闲,因此无法启动。
Serving insecurely on 127.0.0.1:8080 failed to listen on 127.0.0.1:8080: listen tcp 127.0.0.1:8080: bind: address already in use
我用新的 K8s Dev master 节点替换了旧的,但仍然遇到同样的问题,现在当将 k8s master 从 c4.large 垂直缩放到 c4.xlarge 时,它工作正常!