kubernetes: api-server 和 controller-manager 无法启动

kubernetes: api-server and controller-manager cant start

我有一个 运行 k8s 集群,使用 kubeadm 设置。 我有问题,由于绑定异常,api-servercontroller-manager pod 无法启动:

failed to create listener: failed to listen on 0.0.0.0:6443: listen tcp 0.0.0.0:6443: bind: address already in use

我们最近在所有节点上将 docker-ce 从版本 18.01 降级到 17.09,因为 docker 在重新创建容器时存在错误。但是在降级集群之后工作正常,这意味着 api-server 和 controller-manager 是 运行。

我搜索了 google 等等,寻找与 api-server 和 controller-manager 的绑定异常相关的问题,但找不到任何有用的东西

我检查过,主节点上的那个端口上没有其他进程 运行。 我尝试过的事情:

重新启动 kubelet 和 docker 守护进程工作正常,但对问题没有任何影响

Kubeadm / kubectl - 版本:

 kubeadm version: &version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.2", GitCommit:"5fa2db2bd46ac79e5e00a4e6ed24191080aa463b", GitTreeState:"clean", BuildDate:"2018-01-18T09:42:01Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}

使用 weave 作为 netcork-cni

编辑:

主节点

dockerps

CONTAINER ID        IMAGE                                           COMMAND                  CREATED             STATUS              PORTS               NAMES
59239d32b1e4        weaveworks/weave-npc                            "/usr/bin/weave-npc"     About an hour ago   Up About an hour                        k8s_weave-npc_weave-net-74vsh_kube-system_99f6ee35-0f56-11e8-95e1-1614e1ecd749_0
7cb888c1ab4d        weaveworks/weave-kube                           "/home/weave/launc..."   About an hour ago   Up About an hour                        k8s_weave_weave-net-74vsh_kube-system_99f6ee35-0f56-11e8-95e1-1614e1ecd749_0
1ad50c15f816        gcr.io/google_containers/pause-amd64:3.0        "/pause"                 About an hour ago   Up About an hour                        k8s_POD_weave-net-74vsh_kube-system_99f6ee35-0f56-11e8-95e1-1614e1ecd749_0
ecb845f1dfae        gcr.io/google_containers/etcd-amd64             "etcd --advertise-..."   2 hours ago         Up 2 hours                              k8s_etcd_etcd-kube01_kube-system_1b6fafb5dc39ea18814d9bc27da851eb_6
001234690d7a        gcr.io/google_containers/kube-scheduler-amd64   "kube-scheduler --..."   2 hours ago         Up 2 hours                              k8s_kube-scheduler_kube-scheduler-kube01_kube-system_69c12074e336b0dbbd0a1666ce05226a_3
0ce04f222f08        gcr.io/google_containers/pause-amd64:3.0        "/pause"                 2 hours ago         Up 2 hours                              k8s_POD_kube-scheduler-kube01_kube-system_69c12074e336b0dbbd0a1666ce05226a_3
0a3d9eabd961        gcr.io/google_containers/pause-amd64:3.0        "/pause"                 2 hours ago         Up 2 hours                              k8s_POD_kube-apiserver-kube01_kube-system_95c67f50e46db081012110e8bcce9dfc_3
c77767104eb9        gcr.io/google_containers/pause-amd64:3.0        "/pause"                 2 hours ago         Up 2 hours                              k8s_POD_etcd-kube01_kube-system_1b6fafb5dc39ea18814d9bc27da851eb_4
319873797a8a        gcr.io/google_containers/pause-amd64:3.0        "/pause"                 2 hours ago         Up 2 hours                              k8s_POD_kube-controller-manager-kube01_kube-system_f64b9b5ba10a00baa5c176d5877e8671_4

journalctl - 完整:

Feb 11 19:51:03 kube01 kubelet[3195]: I0211 19:51:03.205824    3195 kuberuntime_manager.go:758] checking backoff for container "kube-controller-manager" in pod "kube-controller-manager-kube01_kube-system(f64b9b5ba10a00baa5c176d5877e8671)"
Feb 11 19:51:03 kube01 kubelet[3195]: I0211 19:51:03.205991    3195 kuberuntime_manager.go:768] Back-off 5m0s restarting failed container=kube-controller-manager pod=kube-controller-manager-kube01_kube-system(f64b9b5ba10a00baa5c176d5877e8671)
Feb 11 19:51:03 kube01 kubelet[3195]: E0211 19:51:03.206039    3195 pod_workers.go:186] Error syncing pod f64b9b5ba10a00baa5c176d5877e8671 ("kube-controller-manager-kube01_kube-system(f64b9b5ba10a00baa5c176d5877e8671)"), skipping: failed to "StartContainer" for "kube-controller-manager" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=kube-controller-manager pod=kube-controller-manager-kube01_kube-system(f64b9b5ba10a00baa5c176d5877e8671)"
Feb 11 19:51:03 kube01 kubelet[3195]: I0211 19:51:03.206161    3195 kuberuntime_manager.go:514] Container {Name:kube-apiserver Image:gcr.io/google_containers/kube-apiserver-amd64:v1.9.2 Command:[kube-apiserver --client-ca-file=/etc/kubernetes/pki/ca.crt --admission-control=Initializers,NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,ResourceQuota --allow-privileged=true --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --requestheader-extra-headers-prefix=X-Remote-Extra- --advertise-address=207.154.252.249 --service-cluster-ip-range=10.96.0.0/12 --insecure-port=0 --enable-bootstrap-token-auth=true --requestheader-allowed-names=front-proxy-client --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key --requestheader-username-headers=X-Remote-User --service-account-key-file=/etc/kubernetes/pki/sa.pub --tls-cert-file=/etc/kubernetes/pki/apiserver.crt --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key --secure-port=6443 --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --requestheader-group-headers=X-Remote-Group --tls-private-key-file=/etc/kubernetes/pki/apiserver.key --authorization-mode=Node,RBAC --etcd-servers=http://127.0.0.1:2379] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:250 scale:-3} d:{Dec:<nil>} s:250m Format:DecimalSI}]} VolumeMounts:[{Name:k8s-certs ReadOnly:true MountPath:/etc/kubernetes/pki SubPath: MountPropagation:<nil>} {Name:ca-certs ReadOnly:true MountPath:/etc/ssl/certs SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/healthz,Port:6443,Host:207.154.252.249,Scheme:HTTPS,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:15,TimeoutSeconds:15,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:8,} ReadinessProbe:nil Lifecycle:nil Terminat
Feb 11 19:51:03 kube01 kubelet[3195]: ionMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Feb 11 19:51:03 kube01 kubelet[3195]: I0211 19:51:03.206234    3195 kuberuntime_manager.go:758] checking backoff for container "kube-apiserver" in pod "kube-apiserver-kube01_kube-system(95c67f50e46db081012110e8bcce9dfc)"
Feb 11 19:51:03 kube01 kubelet[3195]: I0211 19:51:03.206350    3195 kuberuntime_manager.go:768] Back-off 5m0s restarting failed container=kube-apiserver pod=kube-apiserver-kube01_kube-system(95c67f50e46db081012110e8bcce9dfc)
Feb 11 19:51:03 kube01 kubelet[3195]: E0211 19:51:03.206381    3195 pod_workers.go:186] Error syncing pod 95c67f50e46db081012110e8bcce9dfc ("kube-apiserver-kube01_kube-system(95c67f50e46db081012110e8bcce9dfc)"), skipping: failed to "StartContainer" for "kube-apiserver" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=kube-apiserver pod=kube-apiserver-kube01_kube-system(95c67f50e46db081012110e8bcce9dfc)"
Feb 11 19:51:12 kube01 kubelet[3195]: E0211 19:51:12.816797    3195 fs.go:418] Stat fs failed. Error: no such file or directory
Feb 11 19:51:14 kube01 kubelet[3195]: I0211 19:51:14.203327    3195 kuberuntime_manager.go:514] Container {Name:kube-apiserver Image:gcr.io/google_containers/kube-apiserver-amd64:v1.9.2 Command:[kube-apiserver --client-ca-file=/etc/kubernetes/pki/ca.crt --admission-control=Initializers,NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,ResourceQuota --allow-privileged=true --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --requestheader-extra-headers-prefix=X-Remote-Extra- --advertise-address=207.154.252.249 --service-cluster-ip-range=10.96.0.0/12 --insecure-port=0 --enable-bootstrap-token-auth=true --requestheader-allowed-names=front-proxy-client --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key --requestheader-username-headers=X-Remote-User --service-account-key-file=/etc/kubernetes/pki/sa.pub --tls-cert-file=/etc/kubernetes/pki/apiserver.crt --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key --secure-port=6443 --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --requestheader-group-headers=X-Remote-Group --tls-private-key-file=/etc/kubernetes/pki/apiserver.key --authorization-mode=Node,RBAC --etcd-servers=http://127.0.0.1:2379] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:250 scale:-3} d:{Dec:<nil>} s:250m Format:DecimalSI}]} VolumeMounts:[{Name:k8s-certs ReadOnly:true MountPath:/etc/kubernetes/pki SubPath: MountPropagation:<nil>} {Name:ca-certs ReadOnly:true MountPath:/etc/ssl/certs SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/healthz,Port:6443,Host:207.154.252.249,Scheme:HTTPS,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:15,TimeoutSeconds:15,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:8,} ReadinessProbe:nil Lifecycle:nil Terminat
Feb 11 19:51:14 kube01 kubelet[3195]: ionMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Feb 11 19:51:14 kube01 kubelet[3195]: I0211 19:51:14.203631    3195 kuberuntime_manager.go:758] checking backoff for container "kube-apiserver" in pod "kube-apiserver-kube01_kube-system(95c67f50e46db081012110e8bcce9dfc)"
Feb 11 19:51:14 kube01 kubelet[3195]: I0211 19:51:14.203833    3195 kuberuntime_manager.go:768] Back-off 5m0s restarting failed container=kube-apiserver pod=kube-apiserver-kube01_kube-system(95c67f50e46db081012110e8bcce9dfc)
Feb 11 19:51:14 kube01 kubelet[3195]: E0211 19:51:14.203886    3195 pod_workers.go:186] Error syncing pod 95c67f50e46db081012110e8bcce9dfc ("kube-apiserver-kube01_kube-system(95c67f50e46db081012110e8bcce9dfc)"), skipping: failed to "StartContainer" for "kube-apiserver" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=kube-apiserver pod=kube-apiserver-kube01_kube-system(95c67f50e46db081012110e8bcce9dfc)"
Feb 11 19:51:15 kube01 kubelet[3195]: I0211 19:51:15.203837    3195 kuberuntime_manager.go:514] Container {Name:kube-controller-manager Image:gcr.io/google_containers/kube-controller-manager-amd64:v1.9.2 Command:[kube-controller-manager --leader-elect=true --controllers=*,bootstrapsigner,tokencleaner --kubeconfig=/etc/kubernetes/controller-manager.conf --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt --cluster-signing-key-file=/etc/kubernetes/pki/ca.key --address=127.0.0.1 --use-service-account-credentials=true --root-ca-file=/etc/kubernetes/pki/ca.crt --service-account-private-key-file=/etc/kubernetes/pki/sa.key] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:200 scale:-3} d:{Dec:<nil>} s:200m Format:DecimalSI}]} VolumeMounts:[{Name:k8s-certs ReadOnly:true MountPath:/etc/kubernetes/pki SubPath: MountPropagation:<nil>} {Name:ca-certs ReadOnly:true MountPath:/etc/ssl/certs SubPath: MountPropagation:<nil>} {Name:kubeconfig ReadOnly:true MountPath:/etc/kubernetes/controller-manager.conf SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/healthz,Port:10252,Host:127.0.0.1,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:15,TimeoutSeconds:15,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:8,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Feb 11 19:51:15 kube01 kubelet[3195]: I0211 19:51:15.205830    3195 kuberuntime_manager.go:758] checking backoff for container "kube-controller-manager" in pod "kube-controller-manager-kube01_kube-system(f64b9b5ba10a00baa5c176d5877e8671)"
Feb 11 19:51:15 kube01 kubelet[3195]: I0211 19:51:15.207429    3195 kuberuntime_manager.go:768] Back-off 5m0s restarting failed container=kube-controller-manager pod=kube-controller-manager-kube01_kube-system(f64b9b5ba10a00baa5c176d5877e8671)
Feb 11 19:51:15 kube01 kubelet[3195]: E0211 19:51:15.207813    3195 pod_workers.go:186] Error syncing pod f64b9b5ba10a00baa5c176d5877e8671 ("kube-controller-manager-kube01_kube-system(f64b9b5ba10a00baa5c176d5877e8671)"), skipping: failed to "StartContainer" for "kube-controller-manager" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=kube-controller-manager pod=kube-controller-manager-kube01_kube-system(f64b9b5ba10a00baa5c176d5877e8671)"
Feb 11 19:51:26 kube01 kubelet[3195]: I0211 19:51:26.203361    3195 kuberuntime_manager.go:514] Container {Name:kube-apiserver Image:gcr.io/google_containers/kube-apiserver-amd64:v1.9.2 Command:[kube-apiserver --client-ca-file=/etc/kubernetes/pki/ca.crt --admission-control=Initializers,NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,ResourceQuota --allow-privileged=true --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --requestheader-extra-headers-prefix=X-Remote-Extra- --advertise-address=207.154.252.249 --service-cluster-ip-range=10.96.0.0/12 --insecure-port=0 --enable-bootstrap-token-auth=true --requestheader-allowed-names=front-proxy-client --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key --requestheader-username-headers=X-Remote-User --service-account-key-file=/etc/kubernetes/pki/sa.pub --tls-cert-file=/etc/kubernetes/pki/apiserver.crt --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key --secure-port=6443 --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --requestheader-group-headers=X-Remote-Group --tls-private-key-file=/etc/kubernetes/pki/apiserver.key --authorization-mode=Node,RBAC --etcd-servers=http://127.0.0.1:2379] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:250 scale:-3} d:{Dec:<nil>} s:250m Format:DecimalSI}]} VolumeMounts:[{Name:k8s-certs ReadOnly:true MountPath:/etc/kubernetes/pki SubPath: MountPropagation:<nil>} {Name:ca-certs ReadOnly:true MountPath:/etc/ssl/certs SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/healthz,Port:6443,Host:207.154.252.249,Scheme:HTTPS,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:15,TimeoutSeconds:15,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:8,} ReadinessProbe:nil Lifecycle:nil Terminat
Feb 11 19:51:26 kube01 kubelet[3195]: ionMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Feb 11 19:51:26 kube01 kubelet[3195]: I0211 19:51:26.205258    3195 kuberuntime_manager.go:758] checking backoff for container "kube-apiserver" in pod "kube-apiserver-kube01_kube-system(95c67f50e46db081012110e8bcce9dfc)"
Feb 11 19:51:26 kube01 kubelet[3195]: I0211 19:51:26.205670    3195 kuberuntime_manager.go:768] Back-off 5m0s restarting failed container=kube-apiserver pod=kube-apiserver-kube01_kube-system(95c67f50e46db081012110e8bcce9dfc)
Feb 11 19:51:26 kube01 kubelet[3195]: E0211 19:51:26.205965    3195 pod_workers.go:186] Error syncing pod 95c67f50e46db081012110e8bcce9dfc ("kube-apiserver-kube01_kube-system(95c67f50e46db081012110e8bcce9dfc)"), skipping: failed to "StartContainer" for "kube-apiserver" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=kube-apiserver pod=kube-apiserver-kube01_kube-system(95c67f50e46db081012110e8bcce9dfc)"
Feb 11 19:51:29 kube01 kubelet[3195]: I0211 19:51:29.203234    3195 kuberuntime_manager.go:514] Container {Name:kube-controller-manager Image:gcr.io/google_containers/kube-controller-manager-amd64:v1.9.2 Command:[kube-controller-manager --leader-elect=true --controllers=*,bootstrapsigner,tokencleaner --kubeconfig=/etc/kubernetes/controller-manager.conf --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt --cluster-signing-key-file=/etc/kubernetes/pki/ca.key --address=127.0.0.1 --use-service-account-credentials=true --root-ca-file=/etc/kubernetes/pki/ca.crt --service-account-private-key-file=/etc/kubernetes/pki/sa.key] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:200 scale:-3} d:{Dec:<nil>} s:200m Format:DecimalSI}]} VolumeMounts:[{Name:k8s-certs ReadOnly:true MountPath:/etc/kubernetes/pki SubPath: MountPropagation:<nil>} {Name:ca-certs ReadOnly:true MountPath:/etc/ssl/certs SubPath: MountPropagation:<nil>} {Name:kubeconfig ReadOnly:true MountPath:/etc/kubernetes/controller-manager.conf SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/healthz,Port:10252,Host:127.0.0.1,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:15,TimeoutSeconds:15,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:8,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Feb 11 19:51:29 kube01 kubelet[3195]: I0211 19:51:29.207713    3195 kuberuntime_manager.go:758] checking backoff for container "kube-controller-manager" in pod "kube-controller-manager-kube01_kube-system(f64b9b5ba10a00baa5c176d5877e8671)"
Feb 11 19:51:29 kube01 kubelet[3195]: I0211 19:51:29.208492    3195 kuberuntime_manager.go:768] Back-off 5m0s restarting failed container=kube-controller-manager pod=kube-controller-manager-kube01_kube-system(f64b9b5ba10a00baa5c176d5877e8671)
Feb 11 19:51:29 kube01 kubelet[3195]: E0211 19:51:29.208875    3195 pod_workers.go:186] Error syncing pod f64b9b5ba10a00baa5c176d5877e8671 ("kube-controller-manager-kube01_kube-system(f64b9b5ba10a00baa5c176d5877e8671)"), skipping: failed to "StartContainer" for "kube-controller-manager" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=kube-controller-manager pod=kube-controller-manager-kube01_kube-system(f64b9b5ba10a00baa5c176d5877e8671)"
Feb 11 19:51:32 kube01 kubelet[3195]: E0211 19:51:32.369188    3195 fs.go:418] Stat fs failed. Error: no such file or directory
Feb 11 19:51:39 kube01 kubelet[3195]: I0211 19:51:39.203802    3195 kuberuntime_manager.go:514] Container {Name:kube-apiserver Image:gcr.io/google_containers/kube-apiserver-amd64:v1.9.2 Command:[kube-apiserver --client-ca-file=/etc/kubernetes/pki/ca.crt --admission-control=Initializers,NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,ResourceQuota --allow-privileged=true --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --requestheader-extra-headers-prefix=X-Remote-Extra- --advertise-address=207.154.252.249 --service-cluster-ip-range=10.96.0.0/12 --insecure-port=0 --enable-bootstrap-token-auth=true --requestheader-allowed-names=front-proxy-client --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key --requestheader-username-headers=X-Remote-User --service-account-key-file=/etc/kubernetes/pki/sa.pub --tls-cert-file=/etc/kubernetes/pki/apiserver.crt --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key --secure-port=6443 --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --requestheader-group-headers=X-Remote-Group --tls-private-key-file=/etc/kubernetes/pki/apiserver.key --authorization-mode=Node,RBAC --etcd-servers=http://127.0.0.1:2379] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:250 scale:-3} d:{Dec:<nil>} s:250m Format:DecimalSI}]} VolumeMounts:[{Name:k8s-certs ReadOnly:true MountPath:/etc/kubernetes/pki SubPath: MountPropagation:<nil>} {Name:ca-certs ReadOnly:true MountPath:/etc/ssl/certs SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/healthz,Port:6443,Host:207.154.252.249,Scheme:HTTPS,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:15,TimeoutSeconds:15,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:8,} ReadinessProbe:nil Lifecycle:nil Terminat
Feb 11 19:51:39 kube01 kubelet[3195]: ionMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Feb 11 19:51:39 kube01 kubelet[3195]: I0211 19:51:39.205508    3195 kuberuntime_manager.go:758] checking backoff for container "kube-apiserver" in pod "kube-apiserver-kube01_kube-system(95c67f50e46db081012110e8bcce9dfc)"
Feb 11 19:51:39 kube01 kubelet[3195]: I0211 19:51:39.206071    3195 kuberuntime_manager.go:768] Back-off 5m0s restarting failed container=kube-apiserver pod=kube-apiserver-kube01_kube-system(95c67f50e46db081012110e8bcce9dfc)
Feb 11 19:51:39 kube01 kubelet[3195]: E0211 19:51:39.206336    3195 pod_workers.go:186] Error syncing pod 95c67f50e46db081012110e8bcce9dfc ("kube-apiserver-kube01_kube-system(95c67f50e46db081012110e8bcce9dfc)"), skipping: failed to "StartContainer" for "kube-apiserver" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=kube-apiserver pod=kube-apiserver-kube01_kube-system(95c67f50e46db081012110e8bcce9dfc)"

kubeadm.conf

[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_SYSTEM_PODS_ARGS=--pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true"
Environment="KUBELET_NETWORK_ARGS=--network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin"
Environment="KUBELET_DNS_ARGS=--cluster-dns=10.96.0.10 --cluster-domain=cluster.local"
Environment="KUBELET_AUTHZ_ARGS=--authorization-mode=Webhook --client-ca-file=/etc/kubernetes/pki/ca.crt"
Environment="KUBELET_CADVISOR_ARGS=--cadvisor-port=0"
Environment="KUBELET_CERTIFICATE_ARGS=--rotate-certificates=true --cert-dir=/var/lib/kubelet/pki"
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_SYSTEM_PODS_ARGS $KUBELET_NETWORK_ARGS $KUBELET_DNS_ARGS $KUBELET_AUTHZ_ARGS $KUBELET_CADVISOR_ARGS $KUBELET_CERTIFICATE_ARGS $KUBELET_EXTRA_ARGS

docker-信息-cgroup

WARNING: No swap limit support
Cgroup Driver: cgroupfs

内核:

Linux kube01 4.4.0-112-generic #135-Ubuntu SMP Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

分布:

Distributor ID: Ubuntu
Description:    Ubuntu 16.04.3 LTS
Release:    16.04
Codename:   xenial

问题很简单,某些服务已经绑定到 6443 以检查您是否可以使用 netstat -lutpn | grep 6443 并终止该进程并重新启动 kubelet 服务。

$ netstat -lutpn | grep 6443
tcp6       0      0 :::6443                 :::*                    LISTEN      11395/some-service

$ kill 11395

$ service kubelet restart

这应该可以解决问题。

对于 kubernetes,如果 kubernetes 没有得到适当的休息并且容器没有得到适当的清理,通常会发生这种情况。

这样做...

$ kubeadm reset
$ docker rm -f $(docker ps -a -q)
$ kubeadm init <options> # new initialization

这意味着节点将不得不重新加入。

就我而言,它有助于:

  1. 在所有节点(主节点和工作节点)上禁用交换
  2. 重启每个节点

之后一切正常。