dev k8s master 显示额外负载并导致无法获得输出 pods

dev k8s master is showing extra load and resulting in to not getting output for getting pods

我的 dev k8s master 显示额外的负载,导致无法获得输出 pods:

admin@ip-172-20-49-150:~$ kubectl get po -n cog-stage

^C
admin@ip-172-20-49-150:~$

admin@ip-172-20-49-150:~$ top

top - 04:36:52 up 2 min,  2 users,  load average: 14.39, 4.43, 1.55
Tasks: 140 total,   2 running, 138 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.2 sy,  0.0 ni,  0.0 id, 99.6 wa,  0.0 hi,  0.0 si,  0.2 st
KiB Mem:   3857324 total,  3778024 used,    79300 free,      192 buffers
KiB Swap:        0 total,        0 used,        0 free.    15680 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
   32 root      20   0       0      0      0 S   2.4  0.0   0:03.75 kswapd0
 1263 root      20   0   97388  19036      0 S   1.3  0.5   0:01.06 kube-controller
 1224 root      20   0   28764  11380      0 S   0.7  0.3   0:01.86 etcd
 1358 root      20   0   46192  10608      0 S   0.7  0.3   0:00.69 kube-scheduler
 1243 root      20   0  372552 343024      0 S   0.6  8.9   0:10.51 etcd
  695 root      20   0  889180  52352      0 S   0.4  1.4   0:05.34 dockerd
  752 root      20   0  205800  13756      0 S   0.4  0.4   0:00.56 protokube
  816 root      20   0  449964  30804      0 S   0.4  0.8   0:02.26 kubelet
 1247 root      20   0 3207664 2.856g      0 S   0.4 77.6   0:55.90 kube-apiserver
 1279 root      20   0   40848   8900      0 S   0.4  0.2   0:00.46 kube-proxy
    1 root      20   0   28788   1940      0 R   0.2  0.1   0:02.06 systemd
  157 root       0 -20       0      0      0 S   0.2  0.0   0:00.06 kworker/1:1H
 1562 admin     20   0   78320   1092      0 S   0.2  0.0   0:00.04 sshd
 1585 admin     20   0   23660    540      0 R   0.2  0.0   0:00.11 top
 1758 admin     20   0   33512    320     32 D   0.2  0.0   0:00.04 kubectl
 1779 root      20   0   39368    436      0 D   0.2  0.0   0:00.01 docker-containe

请告诉我如何解决此问题!

更新 master 上的 kubelet 日志: admin@ip-172-20-49-150:~$ journalctl -u kubelet -f

Jan 06 05:41:44 ip-172-20-49-150 kubelet[819]: E0106 05:41:44.454586     819 pod_workers.go:182] Error syncing pod 685c903f9066f69a2e17c802cb043ed6 ("etcd-server-events-ip-172-20-49-150.us-west-1.compute.internal_kube-system(685c903f9066f69a2e17c802cb043ed6)"), skipping: failed to "StartContainer" for "etcd-container" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=etcd-container pod=etcd-server-events-ip-172-20-XX-XXX.us-west-1.compute.internal_kube-system(685c903f906b043ed6)"
Jan 06 05:41:45 ip-172-20-49-150 kubelet[819]: I0106 05:41:45.454266     819 kuberuntime_manager.go:500] Container {Name:kube-controller-manager Image:gcr.io/google_containers/kube-controller-manager:v1.8.4 Command:[/bin/sh -c /usr/local/bin/kube-controller-manager --allocate-node-cidrs=true --attach-detach-reconcile-sync-period=1m0s --cloud-provider=aws --cluster-cidr=100.96.0.0/11 --cluster-name=uw1b.k8s.ops.goldenratstud.io --cluster-signing-cert-file=/srv/kubernetes/ca.crt --cluster-signing-key-file=/srv/kubernetes/ca.key --configure-cloud-routes=true --kubeconfig=/var/lib/kube-controller-manager/kubeconfig --leader-elect=true --root-ca-file=/srv/kubernetes/ca.crt --service-account-private-key-file=/srv/kubernetes/server.key --use-service-account-credentials=true --v=2 2>&1 | /bin/tee -a /var/log/kube-controller-manager.log] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:100 scale:-3} d:{Dec:<nil>} s:100m Format:DecimalSI}]} VolumeMounts:[{Name:etcssl ReadOnly:true MountPath:/etc/ssl SubPath: MountPropagation:<nil>} {Name:etcpkitls ReadOnly:true MountPath:/etc/pki/tls SubPath: MountPropagation:<nil>} {Name:etcpkica-trust ReadOnly:true MountPath:/etc/pki/ca-trust SubPath: MountPropagation:<nil>} {Name:usrsharessl ReadOnly:true MountPath:/usr/share/ssl SubPath: MountPropagation:<nil>} {Name:usrssl ReadOnly:true MountPath:/usr/ssl SubPath: MountPropagation:<nil>} {Name:usrlibssl ReadOnly:true MountPath:/usr/lib/ssl SubPath: MountPropagation:<nil>} {Name:usrlocalopenssl ReadOnly:true MountPath:/usr/local/openssl SubPath: MountPropagation:<nil>} {Name:varssl ReadOnly:true MountPath:/var/ssl SubPath: MountPropagation:<nil>} {Name:etcopenssl ReadOnly:true MountPath:/etc/openssl SubPath: MountPropagation:<nil>} {Name:srvkube ReadOnly:true MountPath:/srv/kubernetes SubPath: MountPropagation:<nil>} {Name:logfile ReadOnly:false MountPath:/var/log/kube-controller-manager.log SubPath: MountPropagation:<nil>} {Name:varlibkcm ReadOnly:true MountPath:/var/lib/kube-controller-manager SubPath: MountPropagation:<nil>}] Live
Jan 06 05:41:45 ip-172-20-49-150 kubelet[819]: nessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/healthz,Port:10252,Host:127.0.0.1,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:15,TimeoutSeconds:15,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:3,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Jan 06 05:41:45 ip-172-20-49-150 kubelet[819]: I0106 05:41:45.454658     819 kuberuntime_manager.go:739] checking backoff for container "kube-controller-manager" in pod "kube-controller-manager-ip-172-20-49-150.us-west-1.compute.internal_kube-system(ef6f03ef0b14d853dd38e4c2a5f426dc)"
Jan 06 05:41:45 ip-172-20-49-150 kubelet[819]: I0106 05:41:45.454781     819 kuberuntime_manager.go:749] Back-off 5m0s restarting failed container=kube-controller-manager pod=kube-controller-manager-ip-172-20-49-150.us-west-1.compute.internal_kube-system(ef6f03ef0b14d853dd38e4c2a5f426dc)
Jan 06 05:41:45 ip-172-20-49-150 kubelet[819]: E0106 05:41:45.454813     819 pod_workers.go:182] Error syncing pod ef6f03ef0b14d853dd38e4c2a5f426dc ("kube-controller-manager-ip-172-20-49-150.us-west-1.compute.internal_kube-system(ef6f03ef0b14d853dd38e4c2a5f426dc)"), skipping: failed to "StartContainer" for "kube-controller-manager" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=kube-controller-manager pod=kube-controller-manager-ip-172-20-49-150.us-west-1.compute.internal_kube-system(ef6f03ef0b14d853dd38e4c2a5f426dc)"
Jan 06 05:41:47 ip-172-20-49-150 kubelet[819]: I0106 05:41:47.432074     819 container.go:471] Failed to update stats for container "/kubepods/burstable/pod2a5faee9437283d8ac7f396d86d07a03/0f62ea06693a7d4aaf6702d8ca370f2d5d2f1f3c4fdeab09aede15ea5311e47c": unable to determine device info for dir: /var/lib/docker/overlay/ce30183e915076727e708ed10b2ada4d55d1fe6d5c989c1cffc3e29cc00dad94: stat failed on /var/lib/docker/overlay/ce30183e915076727e708ed10b2ada4d55d1fe6d5c989c1cffc3e29cc00dad94 with error: no such file or directory, continuing to push stats

看起来您已经启动了带有标记 --insecure-bind-address=127.0.0.1 的 api-服务器。您的主机上的端口 8080 未空闲,因此无法启动。

Serving insecurely on 127.0.0.1:8080 failed to listen on 127.0.0.1:8080: listen tcp 127.0.0.1:8080: bind: address already in use

我用新的 K8s Dev master 节点替换了旧的,但仍然遇到同样的问题,现在当将 k8s master 从 c4.large 垂直缩放到 c4.xlarge 时,它工作正常!