Kubernetes - 主节点中的 kube-system pods 在工作节点加入后不断重启

Kubernetes - kube-system pods in master node keep restarting after worker node joins

我已经关注了这个 tutorial and this tutorial and this one 但我在过去 3 天遇到了同样的问题。

我可以通过以下步骤正确设置主节点:

kubeadm init

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

export kubever=$(kubectl version | base64 | tr -d ‘\’)
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$kubever"

一切似乎都很好
kubectl get all --namespace=kube-system

然后,

在工作节点上:

kubeadm join --token 864655.fdf6d0b389867b79 192.168.100.17:6443 --discovery-token-ca-cert-hash sha256:a2d840808b17b53b9612e6271ccde489f13dbede7d354f97188d0faa9e210af2

输出看起来不错,如下所示:

[preflight] Running pre-flight checks.
  [WARNING FileExisting-crictl]: crictl not found in system path
[preflight] Starting the kubelet service
[discovery] Trying to connect to API Server "192.168.100.17:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://192.168.100.17:6443"
[discovery] Requesting info from "https://192.168.100.17:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "192.168.100.17:6443"
[discovery] Successfully established connection with API Server "192.168.100.17:6443"

This node has joined the cluster:
* Certificate signing request was sent to master and a response
  was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the master to see this node join the cluster.

BUT 一旦我 运行 这个命令,一切都会崩溃。

kubectl get all --namespace=kube-system

开始显示所有 pods 都在重新启动。状态在 Pending 和 运行 之间不断变化,有时一些 pods 甚至会消失,并且可能有 ContainerCreating 状态等

NAME                                READY     STATUS    RESTARTS   AGE
po/etcd-ubuntu                      0/1       Pending   0          0s
po/kube-controller-manager-ubuntu   0/1       Pending   0          0s
po/kube-dns-6f4fd4bdf-cmcfk         3/3       Running   0          13m
po/kube-proxy-2chb6                 1/1       Running   0          13m
po/kube-scheduler-ubuntu            0/1       Pending   0          0s
po/weave-net-ptdxr                  2/2       Running   0          11m

我也尝试了第二个教程,使用 flannel,得到了完全相同的问题。

我的设置

我在 VMware 上创建了两个全新安装 Ubuntu 17.10 的新虚拟机,每个虚拟机有 2 processor/2core 6 GB 内存和 50 GB 硬盘。我的物理机是 i7-6700k,内存为 32gb。 我在它们上面都安装了 kubeadm、kubelet 和 docker,然后按照上面提到的步骤进行操作。

我也试过在 VMware 上的 NAT 和 Bridge 之间切换,但没有任何改变。

两台带网桥的虚拟机初始IP为192.168.100.12和192.168.100.17。 hostname -I for master:

192.168.100.17 172.17.0.1 10.32.0.1 10.32.0.2

工作节点的hostname -I

192.168.100.12 172.17.0.1 10.44.0.0 10.32.0.1

journalctl -xeu kubelet 显示如下:

https://gist.github.com/saad749/9a771a3460bf88c274498b5bc4b7fd84

在尝试使用法兰绒(仍然是同样的问题)时,

的结果
kubectl describe nodes

https://gist.github.com/saad749/d24c453c8b4e663e9abf572a0fb38bf4

我是否遗漏了 kubeadm init 之前的任何步骤?我应该更改 IP 地址(更改为什么)?有没有我应该查看的特定日志?有更全面的教程吗? 所有问题都是在工作节点上的 kubeadm 加入后开始的,我可以在主节点或任何其他东西上部署 kubernetes,它工作正常。

更新:

即使应用了 errordeveloper 的建议,同样的问题仍然存在。

我将以下标志添加到 kubeadm init 中:

--apiserver-advertise-address 192.168.100.17

我将 kubeadm.conf 更新为以下内容并重新加载并重新启动: https://gist.github.com/saad749/c7149c87ec3e75a40586f626cf04279a

并且还尝试更改集群 dns https://gist.github.com/saad749/5fa66bebc22841e58119333e75600e40

初始化master后的日志:

kube-master@ubuntu:~$ kubectl get pod --all-namespaces -o wide
NAMESPACE     NAME                             READY     STATUS    RESTARTS   AGE       IP               NODE
kube-system   etcd-ubuntu                      1/1       Running   0          22s       192.168.100.17   ubuntu
kube-system   kube-apiserver-ubuntu            1/1       Running   0          29s       192.168.100.17   ubuntu
kube-system   kube-controller-manager-ubuntu   1/1       Running   0          13s       192.168.100.17   ubuntu
kube-system   kube-dns-6f4fd4bdf-wfqhb         3/3       Running   0          1m        10.32.0.7        ubuntu
kube-system   kube-proxy-h4hz9                 1/1       Running   0          1m        192.168.100.17   ubuntu
kube-system   kube-scheduler-ubuntu            1/1       Running   0          34s       192.168.100.17   ubuntu
kube-system   weave-net-fkgnh                  2/2       Running   0          32s       192.168.100.17   ubuntu

hostname -i 结果:

kube-master@ubuntu:~$ hostname -I
192.168.100.17 172.17.0.1 10.32.0.1 10.32.0.2 10.32.0.3 10.32.0.4 10.32.0.5 10.32.0.6 10.244.0.0 10.244.0.1
kube-master@ubuntu:~$ hostname -i
192.168.100.17

结果来自:

kubectl describe nodes

https://gist.github.com/saad749/8f460650182a04d0ddf3158a52761a9a

内部 IP 现在似乎是正确的。

从第二个节点加入后,会发生这种情况:

kube-master@ubuntu:~$ kubectl get nodes
NAME      STATUS    ROLES     AGE       VERSION
ubuntu    Ready     master    49m       v1.9.3
kube-master@ubuntu:~$ kubectl get pod --all-namespaces -o wide
NAMESPACE     NAME                             READY     STATUS              RESTARTS   AGE       IP               NODE
kube-system   kube-controller-manager-ubuntu   0/1       Pending             0          0s        <none>           ubuntu
kube-system   kube-dns-6f4fd4bdf-wfqhb         0/3       ContainerCreating   0          49m       <none>           ubuntu
kube-system   kube-proxy-h4hz9                 1/1       Running             0          49m       192.168.100.17   ubuntu
kube-system   kube-scheduler-ubuntu            1/1       Running             0          1s        192.168.100.17   ubuntu
kube-system   weave-net-fkgnh                  2/2       Running             0          48m       192.168.100.17   ubuntu

ifconfig -a 结果:

https://gist.github.com/saad749/63a5a52bd3246ff72477b2aca7d158d0

journalctl -xeu kubelet 结果

https://gist.github.com/saad749/8a60870b35f93df8565e66cb208aff32

有时pods IP显示为192.168.100.12,这是非主控第二节点的IP

kube-master@ubuntu:~$ kubectl get pod --all-namespaces -o wide
NAMESPACE     NAME                             READY     STATUS    RESTARTS   AGE       IP               NODE
kube-system   etcd-ubuntu                      0/1       Pending   0          0s        <none>           ubuntu
kube-system   kube-apiserver-ubuntu            0/1       Pending   0          0s        <none>           ubuntu
kube-system   kube-controller-manager-ubuntu   1/1       Running   0          0s        192.168.100.12   ubuntu
kube-system   kube-dns-6f4fd4bdf-wfqhb         2/3       Running   0          3h        10.32.0.7        ubuntu
kube-system   kube-proxy-h4hz9                 1/1       Running   0          3h        192.168.100.12   ubuntu
kube-system   kube-scheduler-ubuntu            0/1       Pending   0          0s        <none>           ubuntu
kube-system   weave-net-fkgnh                  2/2       Running   1          3h        192.168.100.17   ubuntu

kube-master@ubuntu:~$ kubectl get pod --all-namespaces -o wide
NAMESPACE     NAME                       READY     STATUS    RESTARTS   AGE       IP               NODE
kube-system   kube-dns-6f4fd4bdf-wfqhb   3/3       Running   0          3h        10.32.0.7        ubuntu
kube-system   kube-proxy-h4hz9           1/1       Running   0          3h        192.168.100.12   ubuntu
kube-system   weave-net-fkgnh            2/2       Running   0          3h        192.168.100.12   ubuntu


kubectl describe nodes
Name:               ubuntu
Roles:              master
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/hostname=ubuntu
                    node-role.kubernetes.io/master=
Annotations:        node.alpha.kubernetes.io/ttl=0
                    volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:             node-role.kubernetes.io/master:NoSchedule
CreationTimestamp:  Fri, 02 Mar 2018 08:21:47 -0800
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  OutOfDisk        False   Fri, 02 Mar 2018 11:38:36 -0800   Fri, 02 Mar 2018 08:21:43 -0800   KubeletHasSufficientDisk     kubelet has sufficient disk space available
  MemoryPressure   False   Fri, 02 Mar 2018 11:38:36 -0800   Fri, 02 Mar 2018 08:21:43 -0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Fri, 02 Mar 2018 11:38:36 -0800   Fri, 02 Mar 2018 08:21:43 -0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  Ready            True    Fri, 02 Mar 2018 11:38:36 -0800   Fri, 02 Mar 2018 11:28:25 -0800   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  InternalIP:  192.168.100.12
  Hostname:    ubuntu
Capacity:
 cpu:     4
 memory:  6080832Ki
 pods:    110
Allocatable:
 cpu:     4
 memory:  5978432Ki
 pods:    110
System Info:
 Machine ID:                 59bf65b835b242a3aa182f4b8a542219
 System UUID:                0C3C4D56-4747-D59E-EE09-F16F2793677E
 Boot ID:                    658b4a08-d724-425e-9246-2b41995ecc46
 Kernel Version:             4.13.0-36-generic
 OS Image:                   Ubuntu 17.10
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://1.13.1
 Kubelet Version:            v1.9.3
 Kube-Proxy Version:         v1.9.3
ExternalID:                  ubuntu
Non-terminated Pods:         (3 in total)
  Namespace                  Name                        CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ---------                  ----                        ------------  ----------  ---------------  -------------
  kube-system                kube-dns-6f4fd4bdf-wfqhb    260m (6%)     0 (0%)      110Mi (1%)       170Mi (2%)
  kube-system                kube-proxy-h4hz9            0 (0%)        0 (0%)      0 (0%)           0 (0%)
  kube-system                weave-net-fkgnh             20m (0%)      0 (0%)      0 (0%)           0 (0%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ------------  ----------  ---------------  -------------
  280m (7%)     0 (0%)      110Mi (1%)       170Mi (2%)
Events:
  Type     Reason                   Age                 From             Message
  ----     ------                   ----                ----             -------
  Warning  Rebooted                 12m (x814 over 2h)  kubelet, ubuntu  Node ubuntu has been rebooted, boot id: 16efd500-a2a5-446f-ba25-1187857996e0
  Normal   NodeHasNoDiskPressure    10m                 kubelet, ubuntu  Node ubuntu status is now: NodeHasNoDiskPressure
  Normal   Starting                 10m                 kubelet, ubuntu  Starting kubelet.
  Normal   NodeAllocatableEnforced  10m                 kubelet, ubuntu  Updated Node Allocatable limit across pods
  Normal   NodeHasSufficientDisk    10m                 kubelet, ubuntu  Node ubuntu status is now: NodeHasSufficientDisk
  Normal   NodeHasSufficientMemory  10m                 kubelet, ubuntu  Node ubuntu status is now: NodeHasSufficientMemory
  Normal   NodeNotReady             10m                 kubelet, ubuntu  Node ubuntu status is now: NodeNotReady
  Warning  Rebooted                 2m (x870 over 2h)   kubelet, ubuntu  Node ubuntu has been rebooted, boot id: 658b4a08-d724-425e-9246-2b41995ecc46
  Warning  Rebooted                 15s (x60 over 10m)  kubelet, ubuntu  Node ubuntu has been rebooted, boot id: 16efd500-a2a5-446f-ba25-1187857996e0

我做错了什么?

Should I change the IP addresses (to what)?

是的,这通常是在默认路由用于通过 NAT 访问 Internet 的 VM 上运行的典型方法。

您要使用桥接网络的 IP,您的主人似乎是 192.168.100.17(但请仔细检查)。

首先,请尝试使用 kubeadm init --apiserver-advertise-address 192.168.100.17,但这可能无法解决所有问题。

在你 kubectl describe nodes 的输出中,我可以看到这个

Addresses:
  InternalIP:  172.17.0.1
  Hostname:    ubuntu

所以你可能想确保 kubelet 也不使用 NATed 接口,为此你需要使用 kubelet 的 --node-ip 标志。

但是,还有其他方法可以解决此问题,例如如果您可以确保 hostname -i returns 桥接接口的 IP(您可以通过调整 /etc/hosts 来实现)。

所以在听从@errordeveloper 的建议后仍然碰壁,我能够解决这个问题,结果证明这个问题非常简单。

我的两个虚拟机都有相同的主机名。

hostname -f 

会return

ubuntu

在两者上,这显然会导致 kubernetes 出现问题。

我用

更改了 non-master 节点上的名称
hostnamectl set-hostname kminion

并在以下文件中:

/etc/hostname
/etc/hosts

一切顺利!