全新安装 Kubernetes 工作节点永远不会成为 "Ready"

Fresh Install Kubernetes worker nodes never become "Ready"

我一直在与 kubernetes 安装问题作斗争。我们启动了一个新的 openstack 环境,在旧的失败环境中工作的脚本在新的环境中失败了。

我们使用 K8s v1.5.4 使用这些脚本:https://github.com/coreos/coreos-kubernetes/tree/master/multi-node/generic

CoreOS 1298.7.0

师傅好像还不错。我可以部署 pods 到它,当 运行ning kubectl get nodes

时总是显示 ready

worker 安装脚本 运行s,但是它从不显示 ready 状态。

kubectl get nodes --show-labels
NAME             STATUS                     AGE       LABELS
MYIP.118.240.122   Ready,SchedulingDisabled   7m        beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=MYIP.118.240.122
MYIP.118.240.129   NotReady                   5m        beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=MYIP.118.240.129

如果我 运行 kubectl describe node MYIP.118.240.129 我得到以下结果:

(testtest)➜  dev kubectl describe node MYIP.118.240.129
Name:           MYIP.118.240.129
Role:
Labels:         beta.kubernetes.io/arch=amd64
            beta.kubernetes.io/os=linux
            kubernetes.io/hostname=MYIP.118.240.129
Taints:         <none>
CreationTimestamp:  Fri, 14 Apr 2017 15:27:47 -0600
Phase:
Conditions:
  Type          Status      LastHeartbeatTime           LastTransitionTime          Reason              Message
  ----          ------      -----------------           ------------------          ------              -------
  OutOfDisk         Unknown     Fri, 14 Apr 2017 15:27:47 -0600     Fri, 14 Apr 2017 15:28:29 -0600     NodeStatusUnknown       Kubelet stopped posting node status.
  MemoryPressure    False       Fri, 14 Apr 2017 15:27:47 -0600     Fri, 14 Apr 2017 15:27:47 -0600     KubeletHasSufficientMemory  kubelet has sufficient memory available
  DiskPressure      False       Fri, 14 Apr 2017 15:27:47 -0600     Fri, 14 Apr 2017 15:27:47 -0600     KubeletHasNoDiskPressure    kubelet has no disk pressure
  Ready         Unknown     Fri, 14 Apr 2017 15:27:47 -0600     Fri, 14 Apr 2017 15:28:29 -0600     NodeStatusUnknown       Kubelet stopped posting node status.
Addresses:      MYIP.118.240.129,MYIP.118.240.129,MYIP.118.240.129
Capacity:
 alpha.kubernetes.io/nvidia-gpu:    0
 cpu:                   1
 memory:                2052924Ki
 pods:                  110
Allocatable:
 alpha.kubernetes.io/nvidia-gpu:    0
 cpu:                   1
 memory:                2052924Ki
 pods:                  110
System Info:
 Machine ID:            efee03ac51c641888MYIP50dfa2a40350d
 System UUID:           4467C959-37FE-48ED-A263-C36DD0D445F1
 Boot ID:           50eb5e93-5aed-441b-b3ef-36da1472e4ea
 Kernel Version:        4.9.16-coreos-r1
 OS Image:          Container Linux by CoreOS 1298.7.0 (Ladybug)
 Operating System:      linux
 Architecture:          amd64
 Container Runtime Version: docker://1.12.6
 Kubelet Version:       v1.5.4+coreos.0
 Kube-Proxy Version:        v1.5.4+coreos.0
ExternalID:         MYIP.118.240.129
Non-terminated Pods:        (5 in total)
  Namespace         Name                        CPU Requests    CPU Limits  Memory Requests Memory Limits
  ---------         ----                        ------------    ----------  --------------- -------------
  kube-system           heapster-v1.2.0-216693398-sfz1m         50m (5%)    50m (5%)    90Mi (4%)   90Mi (4%)
  kube-system           kube-dns-782804071-psmfc            260m (26%)  0 (0%)      140Mi (6%)  220Mi (10%)
  kube-system           kube-dns-autoscaler-2715466192-jmb3h        20m (2%)    0 (0%)      10Mi (0%)   0 (0%)
  kube-system           kube-proxy-MYIP.118.240.129         0 (0%)      0 (0%)      0 (0%)      0 (0%)
  kube-system           kubernetes-dashboard-3543765157-w8zv2       100m (10%)  100m (10%)  50Mi (2%)   50Mi (2%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.
  CPU Requests  CPU Limits  Memory Requests Memory Limits
  ------------  ----------  --------------- -------------
  430m (43%)    150m (15%)  290Mi (14%) 360Mi (17%)
Events:
  FirstSeen LastSeen    Count   From                SubObjectPath   Type        Reason          Message
  --------- --------    -----   ----                -------------   --------    ------          -------
  11m       11m     1   {kubelet MYIP.118.240.129}          Normal      Starting        Starting kubelet.
  11m       11m     1   {kubelet MYIP.118.240.129}          Warning     ImageGCFailed       unable to find data for container /
  11m       11m     2   {kubelet MYIP.118.240.129}          Normal      NodeHasSufficientDisk   Node MYIP.118.240.129 status is now: NodeHasSufficientDisk
  11m       11m     2   {kubelet MYIP.118.240.129}          Normal      NodeHasSufficientMemory Node MYIP.118.240.129 status is now: NodeHasSufficientMemory
  11m       11m     2   {kubelet MYIP.118.240.129}          Normal      NodeHasNoDiskPressure   Node MYIP.118.240.129 status is now: NodeHasNoDiskPressure
(testtest)➜  dev

worker 和 master 之间的内部网络中的所有端口都已打开。

如果我 运行 docker ps 对工人我得到:

ID        IMAGE                                      COMMAND                  CREATED             STATUS              PORTS               NAMES
c25cf12b43f3        quay.io/coreos/hyperkube:v1.5.4_coreos.0   "/hyperkube proxy --m"   4 minutes ago       Up 4 minutes                            k8s_kube-proxy.96aded63_kube-proxy-MYIP.118.240.129_kube-system_23185d6abc4d5c8f11da2ca1943fd398_5ba9628a
c4d14dfd7d52        gcr.io/google_containers/pause-amd64:3.0   "/pause"                 6 minutes ago       Up 6 minutes                            k8s_POD.d8dbe16c_kube-proxy-MYIP.118.240.129_kube-

system_23185d6abc4d5c8f11da2ca1943fd398_e8a1c6d6

kubelet 在 运行整个周末后登录:

    Apr 17 20:53:15 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:53:15.507939    1353 container_manager_linux.go:625] error opening pid file /run/docker/libcontainerd/docker-containerd.pid: open /run/docker/libcontainerd/docker-containerd.pid: no such file or directory
Apr 17 20:48:15 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:48:15.484016    1353 container_manager_linux.go:625] error opening pid file /run/docker/libcontainerd/docker-containerd.pid: open /run/docker/libcontainerd/docker-containerd.pid: no such file or directory
Apr 17 20:43:15 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:15.405888    1353 container_manager_linux.go:625] error opening pid file /run/docker/libcontainerd/docker-containerd.pid: open /run/docker/libcontainerd/docker-containerd.pid: no such file or directory
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: W0417 20:43:07.361035    1353 kubelet.go:1497] Deleting mirror pod "kube-proxy-MYIP.118.240.129_kube-system(37537fb7-2159-11e7-b692-fa163e952b1c)" because it is outdated
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.018406    1353 event.go:208] Unable to write event: 'Post https://MYIP.118.240.122:443/api/v1/namespaces/kube-system/events: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer' (may retry after sleeping)
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.017813    1353 reflector.go:188] pkg/kubelet/kubelet.go:386: Failed to list *api.Node: Get https://MYIP.118.240.122:443/api/v1/nodes?fieldSelector=metadata.name%3DMYIP.118.240.129&resourceVersion=0: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.017711    1353 reflector.go:188] pkg/kubelet/kubelet.go:378: Failed to list *api.Service: Get https://MYIP.118.240.122:443/api/v1/services?resourceVersion=0: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.016457    1353 kubelet_node_status.go:302] Error updating node status, will retry: error getting node "MYIP.118.240.129": Get https://MYIP.118.240.122:443/api/v1/nodes?fieldSelector=metadata.name%3DMYIP.118.240.129: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.0161MYIP    1353 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/secret/e8ea63b2-2159-11e7-b692-fa163e952b1c-default-token-93sd7\" (\"e8ea63b2-2159-11e7-b692-fa163e952b1c\")" failed. No retries permitted until 2017-04-17 20:45:07.016165356 +0000 UTC (durationBeforeRetry 2m0s). Error: MountVolume.SetUp failed for volume "kubernetes.io/secret/e8ea63b2-2159-11e7-b692-fa163e952b1c-default-token-93sd7" (spec.Name: "default-token-93sd7") pod "e8ea63b2-2159-11e7-b692-fa163e952b1c" (UID: "e8ea63b2-2159-11e7-b692-fa163e952b1c") with: Get https://MYIP.118.240.122:443/api/v1/namespaces/kube-system/secrets/default-token-93sd7: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.016058    1353 secret.go:197] Couldn't get secret kube-system/default-token-93sd7
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.015943    1353 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/secret/ec05331e-2158-11e7-b692-fa163e952b1c-default-token-93sd7\" (\"ec05331e-2158-11e7-b692-fa163e952b1c\")" failed. No retries permitted until 2017-04-17 20:45:07.015913703 +0000 UTC (durationBeforeRetry 2m0s). Error: MountVolume.SetUp failed for volume "kubernetes.io/secret/ec05331e-2158-11e7-b692-fa163e952b1c-default-token-93sd7" (spec.Name: "default-token-93sd7") pod "ec05331e-2158-11e7-b692-fa163e952b1c" (UID: "ec05331e-2158-11e7-b692-fa163e952b1c") with: Get https://MYIP.118.240.122:443/api/v1/namespaces/kube-system/secrets/default-token-93sd7: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.015843    1353 secret.go:197] Couldn't get secret kube-system/default-token-93sd7
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.015732    1353 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/secret/e8fdcca4-2159-11e7-b692-fa163e952b1c-default-token-93sd7\" (\"e8fdcca4-2159-11e7-b692-fa163e952b1c\")" failed. No retries permitted until 2017-04-17 20:45:07.015656131 +0000 UTC (durationBeforeRetry 2m0s). Error: MountVolume.SetUp failed for volume "kubernetes.io/secret/e8fdcca4-2159-11e7-b692-fa163e952b1c-default-token-93sd7" (spec.Name: "default-token-93sd7") pod "e8fdcca4-2159-11e7-b692-fa163e952b1c" (UID: "e8fdcca4-2159-11e7-b692-fa163e952b1c") with: Get https://MYIP.118.240.122:443/api/v1/namespaces/kube-system/secrets/default-token-93sd7: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.015559    1353 secret.go:197] Couldn't get secret kube-system/default-token-93sd7
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.015429    1353 reflector.go:188] pkg/kubelet/config/apiserver.go:44: Failed to list *api.Pod: Get https://MYIP.118.240.122:443/api/v1/pods?fieldSelector=spec.nodeName%3DMYIP.118.240.129&resourceVersion=0: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.012918    1353 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/secret/ec091be8-2158-11e7-b692-fa163e952b1c-default-token-93sd7\" (\"ec091be8-2158-11e7-b692-fa163e952b1c\")" failed. No retries permitted until 2017-04-17 20:45:07.012889039 +0000 UTC (durationBeforeRetry 2m0s). Error: MountVolume.SetUp failed for volume "kubernetes.io/secret/ec091be8-2158-11e7-b692-fa163e952b1c-default-token-93sd7" (spec.Name: "default-token-93sd7") pod "ec091be8-2158-11e7-b692-fa163e952b1c" (UID: "ec091be8-2158-11e7-b692-fa163e952b1c") with: Get https://MYIP.118.240.122:443/api/v1/namespaces/kube-system/secrets/default-token-93sd7: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.012820    1353 secret.go:197] Couldn't get secret kube-system/default-token-93sd7
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.012661    1353 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/secret/ec09da25-2158-11e7-b692-fa163e952b1c-default-token-93sd7\" (\"ec09da25-2158-11e7-b692-fa163e952b1c\")" failed. No retries permitted until 2017-04-17 20:45:07.012630687 +0000 UTC (durationBeforeRetry 2m0s). Error: MountVolume.SetUp failed for volume "kubernetes.io/secret/ec09da25-2158-11e7-b692-fa163e952b1c-default-token-93sd7" (spec.Name: "default-token-93sd7") pod "ec09da25-2158-11e7-b692-fa163e952b1c" (UID: "ec09da25-2158-11e7-b692-fa163e952b1c") with: Get https://MYIP.118.240.122:443/api/v1/namespaces/kube-system/secrets/default-token-93sd7: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer

如果您在日志中注意到工作节点在与主节点通信时遇到问题....

但是,如果我通过 ssh 进入 worker 并且 运行 命令如下:

core@philtest ~ $ curl https://MYIP.118.240.122:443/api/v1/namespaces/kube-system/secrets/default-token-93sd7 --insecure
Unauthorized

它是 TLS,所以我没想到它当然会被授权。

关于如何调试这个有什么建议吗?

谢谢!

您需要检查您是否在主服务器的 SSL 生成文件 (openssl.cnf) 中添加了您的 IP 地址。 也尝试使用您的 dns 服务器的 IP 重新创建您的证书(如果您遵循 coreOS,它是 10.3.0.1 )。您的 openssl.cnf 将如下所示:

 [req]
 req_extensions = v3_req
 distinguished_name = req_distinguished_name
 [req_distinguished_name]
 [ v3_req ]
 basicConstraints = CA:FALSE
 keyUsage = nonRepudiation, digitalSignature, keyEncipherment
 subjectAltName = @alt_names
 [alt_names]
 DNS.1 = kubernetes
 DNS.2 = kubernetes.default
 DNS.3 = kubernetes.default.svc
 DNS.4 = kubernetes.default.svc.cluster.local
 IP.1 = 10.3.0.1
 IP.2 = PRIVATE_MASTER_IP
 IP.3 = PUBLIC_MASTER_IP

您还需要为节点重新创建证书。之后从命名空间中删除秘密以自动重新生成它。 来源 CoreOS docs

原来问题是openstack中MTU的网络设置不一致。大于 1500 字节左右的数据包被丢弃。