Kubernetes coredns pods 卡在 Pending 状态。无法启动仪表板

Question

我正在按照这个 tutorial, and I have troubles to access the Kubernetes dashboard. I already created another question about it that you can see here 构建一个 Kubernetes 集群，但是在深入研究我的集群时，我认为问题可能出在其他地方，这就是我创建一个新问题的原因。

我通过运行执行以下命令启动我的 master：

> kubeadm reset 
> kubeadm init --apiserver-advertise-address=[MASTER_IP] > file.txt
> tail -2 file.txt > join.sh # I keep this file for later

> kubectl apply -f https://git.io/weave-kube/

> kubectl -n kube-system get pod
NAME                                READY   STATUS  RESTARTS    AGE
coredns-fb8b8dccf-kb2zq             0/1     Pending 0           2m46s
coredns-fb8b8dccf-nnc5n             0/1     Pending 0           2m46s
etcd-kubemaster                     1/1     Running 0           93s
kube-apiserver-kubemaster           1/1     Running 0           93s
kube-controller-manager-kubemaster  1/1     Running 0           113s
kube-proxy-lxhvs                    1/1     Running 0           2m46s
kube-scheduler-kubemaster           1/1     Running 0           93s

在这里我们可以看到我有两个 coredns pods 永远停留在 Pending 状态，当我运行命令时：

> kubectl -n kube-system describe pod coredns-fb8b8dccf-kb2zq

我可以在事件部分看到以下警告：

Failed Scheduling : 0/1 nodes are available 1 node(s) had taints that the pod didn't tolerate.

因为是Warning而不是Error，而且作为一个Kubernetes新手，taints对我来说意义不大，我尝试将一个节点连接到master（使用之前保存的命令） :

> cat join.sh
kubeadm join [MASTER_IP]:6443 --token [TOKEN] \
    --discovery-token-ca-cert-hash sha256:[ANOTHER_TOKEN]

> ssh [USER]@[WORKER_IP] 'bash' < join.sh

This node has joined the cluster.

在 master 上，我检查节点是否已连接：

> kubectl get nodes 
NAME        STATUS      ROLES   AGE     VERSION
kubemaster  NotReady    master  13m     v1.14.1
kubeslave1  NotReady    <none>  31s     v1.14.1

然后我检查我的 pods :

> kubectl -n kube-system get pod
NAME                                READY   STATUS              RESTARTS    AGE
coredns-fb8b8dccf-kb2zq             0/1     Pending             0           14m
coredns-fb8b8dccf-nnc5n             0/1     Pending             0           14m
etcd-kubemaster                     1/1     Running             0           13m
kube-apiserver-kubemaster           1/1     Running             0           13m
kube-controller-manager-kubemaster  1/1     Running             0           13m
kube-proxy-lxhvs                    1/1     Running             0           14m
kube-proxy-xllx4                    0/1     ContainerCreating   0           2m16s
kube-scheduler-kubemaster           1/1     Running             0           13m

我们可以看到已经创建了另一个kube-proxy pod，并且卡在了ContainerCreating状态。

当我再次进行描述时：

kubectl -n kube-system describe pod kube-proxy-xllx4

我可以在事件部分看到多个相同的警告：

Failed create pod sandbox : rpx error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.1": Get https://k8s.gcr.io/v1/_ping: dial tcp: lookup k8s.gcr.io on [::1]:53 read up [::1]43133->[::1]:53: read: connection refused

这是我的存储库：

docker image ls
REPOSITORY                          TAG     
k8s.gcr.io/kube-proxy               v1.14.1 
k8s.gcr.io/kube-apiserver           v1.14.1 
k8s.gcr.io/kube-controller-manager  v1.14.1 
k8s.gcr.io/kube-scheduler           v1.14.1 
k8s.gcr.io/coredns                  1.3.1   
k8s.gcr.io/etcd                     3.3.10  
k8s.gcr.io/pause                    3.1

因此，对于仪表板部分，我尝试使用命令

启动它

> kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/aio/deploy/recommended/kubernetes-dashboard.yaml

但是仪表板 pod 卡在 Pending 状态。

kubectl -n kube-system get pod
NAME                                    READY   STATUS              RESTARTS    AGE
coredns-fb8b8dccf-kb2zq                 0/1     Pending             0           40m
coredns-fb8b8dccf-nnc5n                 0/1     Pending             0           40m
etcd-kubemaster                         1/1     Running             0           38m
kube-apiserver-kubemaster               1/1     Running             0           38m
kube-controller-manager-kubemaster      1/1     Running             0           39m
kube-proxy-lxhvs                        1/1     Running             0           40m
kube-proxy-xllx4                        0/1     ContainerCreating   0           27m
kube-scheduler-kubemaster               1/1     Running             0           38m
kubernetes-dashboard-5f7b999d65-qn8qn   1/1     Pending             0           8s

所以，虽然我的问题最初是我无法访问我的仪表板，但我想真正的问题远不止于此。

我知道我只是在这里放了很多信息，但我是一个 k8s 初学者，对此我完全迷失了。

Answer 1

实际上它与深刻或严重的问题相反。这是一个微不足道的问题。你总是看到一个 pod 卡在 Pending 状态，这意味着调度器很难调度这个 pod；主要是因为节点上没有足够的资源。

在您的情况下，它是一个具有节点的 taint，而您的 pod 没有容忍度。你要做的就是描述节点并获取污点：

kubectl describe node | grep -i taints

注意：您可能有不止一种污点。所以你可能想要做 kubectl describe no NODE 因为使用 grep 你只会看到一个污点。

一旦你得到污点，那将是 hello=world:NoSchedule;这意味着 key=value:effect，您必须在 Deployment 中添加一个 toleration 部分。这是一个示例 Deployment，因此您可以看到它应该是什么样子：

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  replicas: 10
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - image: nginx
        name: nginx
        ports:
        - containerPort: 80
          name: http
      tolerations:
      - effect: NoExecute       #NoSchedule, PreferNoSchedule
        key: node
        operator: Equal
        value: not-ready
        tolerationSeconds: 3600

如您所见，yaml 中有容忍部分。因此，如果我有一个带有 node=not-ready:NoExecute 污点的节点，则不会在该节点上调度任何 pod，除非具有这种容忍度。

如果不需要，您也可以删除 taint。要删除 taint，您需要描述节点，获取污点的 key 并执行：

kubectl taint node NODE key-

希望它有意义。只需将此部分添加到您的部署中，它就会起作用。

Answer 2

我遇到了一个问题，coredns pods 在设置您自己的集群时卡在了挂起模式；我通过添加 pod 网络来解决这个问题。

看起来因为没有安装网络插件，节点被污染为 not-ready。安装插件将消除污点，Pods 将能够进行调度。在我的例子中，添加 flannel 解决了这个问题。

编辑：在官方 k8s documentation - Create cluster with kubeadm:

中有关于此的注释

The network must be deployed before any applications. Also, CoreDNS will not start up before a network is installed. kubeadm only supports Container Network Interface (CNI) based networks (and does not support kubenet).

Answer 3

设置 flannel 网络工具。

运行命令：

$ sysctl net.bridge.bridge-nf-call-iptables=1
$ kubectl apply -f

https://raw.githubusercontent.com/coreos/flannel/62e44c867a2846fefb68bd5f178daf4da3095ccb/Documentation/kube-flannel.yml

Kubernetes coredns pods 卡在 Pending 状态。无法启动仪表板

Kubernetes coredns pods stuck in Pending status. Cannot start the dashboard

kubernetes

coredns