Rancher:kube-system pods 卡在 ContainerCreating
Rancher: kube-system pods stuck on ContainerCreating
我正在尝试用一个节点(VM 机器)启动一个集群,但我得到一些 pods 用于 kube-system
卡住为 ContainerCreating
> kubectl get pods,svc -owide --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cattle-system pod/cattle-cluster-agent-7db88c6b68-bz5dp 0/1 ContainerCreating 0 7m13s <none> hdn-dev-app66 <none> <none>
cattle-system pod/cattle-node-agent-ccntw 1/1 Running 0 7m13s 10.105.1.76 hdn-dev-app66 <none> <none>
cattle-system pod/kube-api-auth-9kdpw 1/1 Running 0 7m13s 10.105.1.76 hdn-dev-app66 <none> <none>
ingress-nginx pod/default-http-backend-598b7d7dbd-rwvhm 0/1 ContainerCreating 0 7m29s <none> hdn-dev-app66 <none> <none>
ingress-nginx pod/nginx-ingress-controller-62vhq 1/1 Running 0 7m29s 10.105.1.76 hdn-dev-app66 <none> <none>
kube-system pod/coredns-849545576b-w87zr 0/1 ContainerCreating 0 7m39s <none> hdn-dev-app66 <none> <none>
kube-system pod/coredns-autoscaler-5dcd676cbd-pj54d 0/1 ContainerCreating 0 7m38s <none> hdn-dev-app66 <none> <none>
kube-system pod/kube-flannel-d9m6q 2/2 Running 0 7m43s 10.105.1.76 hdn-dev-app66 <none> <none>
kube-system pod/metrics-server-697746ff48-q7cpx 0/1 ContainerCreating 0 7m33s <none> hdn-dev-app66 <none> <none>
kube-system pod/rke-coredns-addon-deploy-job-npjll 0/1 Completed 0 7m40s 10.105.1.76 hdn-dev-app66 <none> <none>
kube-system pod/rke-ingress-controller-deploy-job-b9rs4 0/1 Completed 0 7m30s 10.105.1.76 hdn-dev-app66 <none> <none>
kube-system pod/rke-metrics-addon-deploy-job-5rpbj 0/1 Completed 0 7m35s 10.105.1.76 hdn-dev-app66 <none> <none>
kube-system pod/rke-network-plugin-deploy-job-lvk2q 0/1 Completed 0 7m50s 10.105.1.76 hdn-dev-app66 <none> <none>
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default service/kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 8m19s <none>
ingress-nginx service/default-http-backend ClusterIP 10.43.144.25 <none> 80/TCP 7m29s app=default-http-backend
kube-system service/kube-dns ClusterIP 10.43.0.10 <none> 53/UDP,53/TCP,9153/TCP 7m39s k8s-app=kube-dns
kube-system service/metrics-server ClusterIP 10.43.251.47 <none> 443/TCP 7m34s k8s-app=metrics-server
当我描述失败时 pods 我明白了:
Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "345460c8f6399a0cf20956d8ea24d52f5a684ae47c3e8ec247f83d66d56b2baa" network for pod "cattle-cluster-agent-7db88c6b68-bz5dp": networkPlugin cni failed to set up pod "cattle-cluster-agent-7db88c6b68-bz5dp_cattle-system" network: error getting ClusterInformation: connection is unauthorized: clusterinformations.crd.projectcalico.org "default" is forbidden: User "system:node" cannot get resource "clusterinformations" in API group "crd.projectcalico.org" at the cluster scope, failed to clean up sandbox container "345460c8f6399a0cf20956d8ea24d52f5a684ae47c3e8ec247f83d66d56b2baa" network for pod "cattle-cluster-agent-7db88c6b68-bz5dp": networkPlugin cni failed to teardown pod "cattle-cluster-agent-7db88c6b68-bz5dp_cattle-system" network: error getting ClusterInformation: connection is unauthorized: clusterinformations.crd.projectcalico.org "default" is forbidden: User "system:node" cannot get resource "clusterinformations" in API group "crd.projectcalico.org" at the cluster scope]
曾尝试再次重新注册该节点,但没有成功。有什么想法吗?
因为它说未经授权所以你必须给 rbac 权限才能让它工作。
尝试添加
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: system:calico-node
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: calico-node
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:nodes
解决了 https://rancher.com/docs/rancher/v2.x/en/cluster-admin/cleaning-cluster-nodes/ 中有关如何回收损坏节点的以下文章的问题。
我正在尝试用一个节点(VM 机器)启动一个集群,但我得到一些 pods 用于 kube-system
卡住为 ContainerCreating
> kubectl get pods,svc -owide --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cattle-system pod/cattle-cluster-agent-7db88c6b68-bz5dp 0/1 ContainerCreating 0 7m13s <none> hdn-dev-app66 <none> <none>
cattle-system pod/cattle-node-agent-ccntw 1/1 Running 0 7m13s 10.105.1.76 hdn-dev-app66 <none> <none>
cattle-system pod/kube-api-auth-9kdpw 1/1 Running 0 7m13s 10.105.1.76 hdn-dev-app66 <none> <none>
ingress-nginx pod/default-http-backend-598b7d7dbd-rwvhm 0/1 ContainerCreating 0 7m29s <none> hdn-dev-app66 <none> <none>
ingress-nginx pod/nginx-ingress-controller-62vhq 1/1 Running 0 7m29s 10.105.1.76 hdn-dev-app66 <none> <none>
kube-system pod/coredns-849545576b-w87zr 0/1 ContainerCreating 0 7m39s <none> hdn-dev-app66 <none> <none>
kube-system pod/coredns-autoscaler-5dcd676cbd-pj54d 0/1 ContainerCreating 0 7m38s <none> hdn-dev-app66 <none> <none>
kube-system pod/kube-flannel-d9m6q 2/2 Running 0 7m43s 10.105.1.76 hdn-dev-app66 <none> <none>
kube-system pod/metrics-server-697746ff48-q7cpx 0/1 ContainerCreating 0 7m33s <none> hdn-dev-app66 <none> <none>
kube-system pod/rke-coredns-addon-deploy-job-npjll 0/1 Completed 0 7m40s 10.105.1.76 hdn-dev-app66 <none> <none>
kube-system pod/rke-ingress-controller-deploy-job-b9rs4 0/1 Completed 0 7m30s 10.105.1.76 hdn-dev-app66 <none> <none>
kube-system pod/rke-metrics-addon-deploy-job-5rpbj 0/1 Completed 0 7m35s 10.105.1.76 hdn-dev-app66 <none> <none>
kube-system pod/rke-network-plugin-deploy-job-lvk2q 0/1 Completed 0 7m50s 10.105.1.76 hdn-dev-app66 <none> <none>
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default service/kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 8m19s <none>
ingress-nginx service/default-http-backend ClusterIP 10.43.144.25 <none> 80/TCP 7m29s app=default-http-backend
kube-system service/kube-dns ClusterIP 10.43.0.10 <none> 53/UDP,53/TCP,9153/TCP 7m39s k8s-app=kube-dns
kube-system service/metrics-server ClusterIP 10.43.251.47 <none> 443/TCP 7m34s k8s-app=metrics-server
当我描述失败时 pods 我明白了:
Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "345460c8f6399a0cf20956d8ea24d52f5a684ae47c3e8ec247f83d66d56b2baa" network for pod "cattle-cluster-agent-7db88c6b68-bz5dp": networkPlugin cni failed to set up pod "cattle-cluster-agent-7db88c6b68-bz5dp_cattle-system" network: error getting ClusterInformation: connection is unauthorized: clusterinformations.crd.projectcalico.org "default" is forbidden: User "system:node" cannot get resource "clusterinformations" in API group "crd.projectcalico.org" at the cluster scope, failed to clean up sandbox container "345460c8f6399a0cf20956d8ea24d52f5a684ae47c3e8ec247f83d66d56b2baa" network for pod "cattle-cluster-agent-7db88c6b68-bz5dp": networkPlugin cni failed to teardown pod "cattle-cluster-agent-7db88c6b68-bz5dp_cattle-system" network: error getting ClusterInformation: connection is unauthorized: clusterinformations.crd.projectcalico.org "default" is forbidden: User "system:node" cannot get resource "clusterinformations" in API group "crd.projectcalico.org" at the cluster scope]
曾尝试再次重新注册该节点,但没有成功。有什么想法吗?
因为它说未经授权所以你必须给 rbac 权限才能让它工作。
尝试添加
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: system:calico-node
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: calico-node
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:nodes
解决了 https://rancher.com/docs/rancher/v2.x/en/cluster-admin/cleaning-cluster-nodes/ 中有关如何回收损坏节点的以下文章的问题。