Kubernetes

Question

我已经关注了这个 tutorial and this tutorial and this one 但我在过去 3 天遇到了同样的问题。

我可以通过以下步骤正确设置主节点：

kubeadm init

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

export kubever=$(kubectl version | base64 | tr -d ‘\’)
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$kubever"

一切似乎都很好

kubectl get all --namespace=kube-system

然后，

在工作节点上：

kubeadm join --token 864655.fdf6d0b389867b79 192.168.100.17:6443 --discovery-token-ca-cert-hash sha256:a2d840808b17b53b9612e6271ccde489f13dbede7d354f97188d0faa9e210af2

输出看起来不错，如下所示：

[preflight] Running pre-flight checks.
  [WARNING FileExisting-crictl]: crictl not found in system path
[preflight] Starting the kubelet service
[discovery] Trying to connect to API Server "192.168.100.17:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://192.168.100.17:6443"
[discovery] Requesting info from "https://192.168.100.17:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "192.168.100.17:6443"
[discovery] Successfully established connection with API Server "192.168.100.17:6443"

This node has joined the cluster:
* Certificate signing request was sent to master and a response
  was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the master to see this node join the cluster.

BUT 一旦我运行这个命令，一切都会崩溃。

kubectl get all --namespace=kube-system

开始显示所有 pods 都在重新启动。状态在 Pending 和运行之间不断变化，有时一些 pods 甚至会消失，并且可能有 ContainerCreating 状态等

NAME                                READY     STATUS    RESTARTS   AGE
po/etcd-ubuntu                      0/1       Pending   0          0s
po/kube-controller-manager-ubuntu   0/1       Pending   0          0s
po/kube-dns-6f4fd4bdf-cmcfk         3/3       Running   0          13m
po/kube-proxy-2chb6                 1/1       Running   0          13m
po/kube-scheduler-ubuntu            0/1       Pending   0          0s
po/weave-net-ptdxr                  2/2       Running   0          11m

我也尝试了第二个教程，使用 flannel，得到了完全相同的问题。

我的设置

我在 VMware 上创建了两个全新安装 Ubuntu 17.10 的新虚拟机，每个虚拟机有 2 processor/2core 6 GB 内存和 50 GB 硬盘。我的物理机是 i7-6700k，内存为 32gb。我在它们上面都安装了 kubeadm、kubelet 和 docker，然后按照上面提到的步骤进行操作。

我也试过在 VMware 上的 NAT 和 Bridge 之间切换，但没有任何改变。

两台带网桥的虚拟机初始IP为192.168.100.12和192.168.100.17。 hostname -I for master:

192.168.100.17 172.17.0.1 10.32.0.1 10.32.0.2

工作节点的hostname -I：

192.168.100.12 172.17.0.1 10.44.0.0 10.32.0.1

journalctl -xeu kubelet 显示如下：

https://gist.github.com/saad749/9a771a3460bf88c274498b5bc4b7fd84

在尝试使用法兰绒（仍然是同样的问题）时，

的结果

kubectl describe nodes

是

https://gist.github.com/saad749/d24c453c8b4e663e9abf572a0fb38bf4

我是否遗漏了 kubeadm init 之前的任何步骤？我应该更改 IP 地址（更改为什么）？有没有我应该查看的特定日志？有更全面的教程吗？所有问题都是在工作节点上的 kubeadm 加入后开始的，我可以在主节点或任何其他东西上部署 kubernetes，它工作正常。

更新：

即使应用了 errordeveloper 的建议，同样的问题仍然存在。

我将以下标志添加到 kubeadm init 中：

--apiserver-advertise-address 192.168.100.17

我将 kubeadm.conf 更新为以下内容并重新加载并重新启动： https://gist.github.com/saad749/c7149c87ec3e75a40586f626cf04279a

并且还尝试更改集群 dns https://gist.github.com/saad749/5fa66bebc22841e58119333e75600e40

初始化master后的日志：

kube-master@ubuntu:~$ kubectl get pod --all-namespaces -o wide
NAMESPACE     NAME                             READY     STATUS    RESTARTS   AGE       IP               NODE
kube-system   etcd-ubuntu                      1/1       Running   0          22s       192.168.100.17   ubuntu
kube-system   kube-apiserver-ubuntu            1/1       Running   0          29s       192.168.100.17   ubuntu
kube-system   kube-controller-manager-ubuntu   1/1       Running   0          13s       192.168.100.17   ubuntu
kube-system   kube-dns-6f4fd4bdf-wfqhb         3/3       Running   0          1m        10.32.0.7        ubuntu
kube-system   kube-proxy-h4hz9                 1/1       Running   0          1m        192.168.100.17   ubuntu
kube-system   kube-scheduler-ubuntu            1/1       Running   0          34s       192.168.100.17   ubuntu
kube-system   weave-net-fkgnh                  2/2       Running   0          32s       192.168.100.17   ubuntu

hostname -i 结果：

kube-master@ubuntu:~$ hostname -I
192.168.100.17 172.17.0.1 10.32.0.1 10.32.0.2 10.32.0.3 10.32.0.4 10.32.0.5 10.32.0.6 10.244.0.0 10.244.0.1
kube-master@ubuntu:~$ hostname -i
192.168.100.17

结果来自：

kubectl describe nodes

https://gist.github.com/saad749/8f460650182a04d0ddf3158a52761a9a

内部 IP 现在似乎是正确的。

从第二个节点加入后，会发生这种情况：

kube-master@ubuntu:~$ kubectl get nodes
NAME      STATUS    ROLES     AGE       VERSION
ubuntu    Ready     master    49m       v1.9.3
kube-master@ubuntu:~$ kubectl get pod --all-namespaces -o wide
NAMESPACE     NAME                             READY     STATUS              RESTARTS   AGE       IP               NODE
kube-system   kube-controller-manager-ubuntu   0/1       Pending             0          0s        <none>           ubuntu
kube-system   kube-dns-6f4fd4bdf-wfqhb         0/3       ContainerCreating   0          49m       <none>           ubuntu
kube-system   kube-proxy-h4hz9                 1/1       Running             0          49m       192.168.100.17   ubuntu
kube-system   kube-scheduler-ubuntu            1/1       Running             0          1s        192.168.100.17   ubuntu
kube-system   weave-net-fkgnh                  2/2       Running             0          48m       192.168.100.17   ubuntu

ifconfig -a 结果：

https://gist.github.com/saad749/63a5a52bd3246ff72477b2aca7d158d0

journalctl -xeu kubelet 结果

https://gist.github.com/saad749/8a60870b35f93df8565e66cb208aff32

有时pods IP显示为192.168.100.12，这是非主控第二节点的IP

kube-master@ubuntu:~$ kubectl get pod --all-namespaces -o wide
NAMESPACE     NAME                             READY     STATUS    RESTARTS   AGE       IP               NODE
kube-system   etcd-ubuntu                      0/1       Pending   0          0s        <none>           ubuntu
kube-system   kube-apiserver-ubuntu            0/1       Pending   0          0s        <none>           ubuntu
kube-system   kube-controller-manager-ubuntu   1/1       Running   0          0s        192.168.100.12   ubuntu
kube-system   kube-dns-6f4fd4bdf-wfqhb         2/3       Running   0          3h        10.32.0.7        ubuntu
kube-system   kube-proxy-h4hz9                 1/1       Running   0          3h        192.168.100.12   ubuntu
kube-system   kube-scheduler-ubuntu            0/1       Pending   0          0s        <none>           ubuntu
kube-system   weave-net-fkgnh                  2/2       Running   1          3h        192.168.100.17   ubuntu

kube-master@ubuntu:~$ kubectl get pod --all-namespaces -o wide
NAMESPACE     NAME                       READY     STATUS    RESTARTS   AGE       IP               NODE
kube-system   kube-dns-6f4fd4bdf-wfqhb   3/3       Running   0          3h        10.32.0.7        ubuntu
kube-system   kube-proxy-h4hz9           1/1       Running   0          3h        192.168.100.12   ubuntu
kube-system   weave-net-fkgnh            2/2       Running   0          3h        192.168.100.12   ubuntu


kubectl describe nodes
Name:               ubuntu
Roles:              master
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/hostname=ubuntu
                    node-role.kubernetes.io/master=
Annotations:        node.alpha.kubernetes.io/ttl=0
                    volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:             node-role.kubernetes.io/master:NoSchedule
CreationTimestamp:  Fri, 02 Mar 2018 08:21:47 -0800
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  OutOfDisk        False   Fri, 02 Mar 2018 11:38:36 -0800   Fri, 02 Mar 2018 08:21:43 -0800   KubeletHasSufficientDisk     kubelet has sufficient disk space available
  MemoryPressure   False   Fri, 02 Mar 2018 11:38:36 -0800   Fri, 02 Mar 2018 08:21:43 -0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Fri, 02 Mar 2018 11:38:36 -0800   Fri, 02 Mar 2018 08:21:43 -0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  Ready            True    Fri, 02 Mar 2018 11:38:36 -0800   Fri, 02 Mar 2018 11:28:25 -0800   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  InternalIP:  192.168.100.12
  Hostname:    ubuntu
Capacity:
 cpu:     4
 memory:  6080832Ki
 pods:    110
Allocatable:
 cpu:     4
 memory:  5978432Ki
 pods:    110
System Info:
 Machine ID:                 59bf65b835b242a3aa182f4b8a542219
 System UUID:                0C3C4D56-4747-D59E-EE09-F16F2793677E
 Boot ID:                    658b4a08-d724-425e-9246-2b41995ecc46
 Kernel Version:             4.13.0-36-generic
 OS Image:                   Ubuntu 17.10
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://1.13.1
 Kubelet Version:            v1.9.3
 Kube-Proxy Version:         v1.9.3
ExternalID:                  ubuntu
Non-terminated Pods:         (3 in total)
  Namespace                  Name                        CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ---------                  ----                        ------------  ----------  ---------------  -------------
  kube-system                kube-dns-6f4fd4bdf-wfqhb    260m (6%)     0 (0%)      110Mi (1%)       170Mi (2%)
  kube-system                kube-proxy-h4hz9            0 (0%)        0 (0%)      0 (0%)           0 (0%)
  kube-system                weave-net-fkgnh             20m (0%)      0 (0%)      0 (0%)           0 (0%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ------------  ----------  ---------------  -------------
  280m (7%)     0 (0%)      110Mi (1%)       170Mi (2%)
Events:
  Type     Reason                   Age                 From             Message
  ----     ------                   ----                ----             -------
  Warning  Rebooted                 12m (x814 over 2h)  kubelet, ubuntu  Node ubuntu has been rebooted, boot id: 16efd500-a2a5-446f-ba25-1187857996e0
  Normal   NodeHasNoDiskPressure    10m                 kubelet, ubuntu  Node ubuntu status is now: NodeHasNoDiskPressure
  Normal   Starting                 10m                 kubelet, ubuntu  Starting kubelet.
  Normal   NodeAllocatableEnforced  10m                 kubelet, ubuntu  Updated Node Allocatable limit across pods
  Normal   NodeHasSufficientDisk    10m                 kubelet, ubuntu  Node ubuntu status is now: NodeHasSufficientDisk
  Normal   NodeHasSufficientMemory  10m                 kubelet, ubuntu  Node ubuntu status is now: NodeHasSufficientMemory
  Normal   NodeNotReady             10m                 kubelet, ubuntu  Node ubuntu status is now: NodeNotReady
  Warning  Rebooted                 2m (x870 over 2h)   kubelet, ubuntu  Node ubuntu has been rebooted, boot id: 658b4a08-d724-425e-9246-2b41995ecc46
  Warning  Rebooted                 15s (x60 over 10m)  kubelet, ubuntu  Node ubuntu has been rebooted, boot id: 16efd500-a2a5-446f-ba25-1187857996e0

我做错了什么？

Answer 1

Should I change the IP addresses (to what)?

是的，这通常是在默认路由用于通过 NAT 访问 Internet 的 VM 上运行的典型方法。

您要使用桥接网络的 IP，您的主人似乎是 192.168.100.17（但请仔细检查）。

首先，请尝试使用 kubeadm init --apiserver-advertise-address 192.168.100.17，但这可能无法解决所有问题。

在你 kubectl describe nodes 的输出中，我可以看到这个

Addresses:
  InternalIP:  172.17.0.1
  Hostname:    ubuntu

所以你可能想确保 kubelet 也不使用 NATed 接口，为此你需要使用 kubelet 的 --node-ip 标志。

但是，还有其他方法可以解决此问题，例如如果您可以确保 hostname -i returns 桥接接口的 IP（您可以通过调整 /etc/hosts 来实现）。

Answer 2

所以在听从@errordeveloper 的建议后仍然碰壁，我能够解决这个问题，结果证明这个问题非常简单。

我的两个虚拟机都有相同的主机名。

hostname -f

会return

ubuntu

在两者上，这显然会导致 kubernetes 出现问题。

我用

更改了 non-master 节点上的名称

hostnamectl set-hostname kminion

并在以下文件中：

/etc/hostname
/etc/hosts

一切顺利！

Kubernetes - 主节点中的 kube-system pods 在工作节点加入后不断重启

Kubernetes - kube-system pods in master node keep restarting after worker node joins

weave

flannel

kubeadm