kubeadm init 显示 kubelet 不 运行 或不健康

kubeadm init shows kubelet isn't running or healthy

我正在尝试 运行 Kubernetes 并尝试使用 sudo kubeadm init。 根据官方文档的建议,Swap 已关闭。

问题是它显示警告:

[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.


Unfortunately, an error has occurred:
            timed out waiting for the condition

This error is likely caused by:
            - The kubelet is not running
            - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
            - No internet connection is available so the kubelet cannot pull or find the following control plane images:
                - k8s.gcr.io/kube-apiserver-amd64:v1.11.2
                - k8s.gcr.io/kube-controller-manager-amd64:v1.11.2
                - k8s.gcr.io/kube-scheduler-amd64:v1.11.2
                - k8s.gcr.io/etcd-amd64:3.2.18
                - You can check or miligate this in beforehand with "kubeadm config images pull" to make sure the images
                  are downloaded locally and cached.

        If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
            - 'systemctl status kubelet'
            - 'journalctl -xeu kubelet'

        Additionally, a control plane component may have crashed or exited when started by the container runtime.
        To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.
        Here is one example how you may list all Kubernetes containers running in docker:
            - 'docker ps -a | grep kube | grep -v pause'
            Once you have found the failing container, you can inspect its logs with:
            - 'docker logs CONTAINERID'
couldn't initialize a Kubernetes cluster

我使用的docker版本是Docker version 17.03.2-ce, build f5ec1e2 我正在使用 Ubuntu 16.04 LTS 64 位

docker张图片显示以下图片:

REPOSITORY                                 TAG                 IMAGE ID            CREATED             SIZE
k8s.gcr.io/kube-apiserver-amd64            v1.11.2             821507941e9c        3 weeks ago         187 MB
k8s.gcr.io/kube-controller-manager-amd64   v1.11.2             38521457c799        3 weeks ago         155 MB
k8s.gcr.io/kube-proxy-amd64                v1.11.2             46a3cd725628        3 weeks ago         97.8 MB
k8s.gcr.io/kube-scheduler-amd64            v1.11.2             37a1403e6c1a        3 weeks ago         56.8 MB
k8s.gcr.io/coredns                         1.1.3               b3b94275d97c        3 months ago        45.6 MB
k8s.gcr.io/etcd-amd64                      3.2.18              b8df3b177be2        4 months ago        219 MB
k8s.gcr.io/pause                           3.1                 da86e6ba6ca1        8 months ago        742 kB

完整的日志可以在这里找到: https://pastebin.com/T5V0taE3

我在互联网上没有找到任何解决方案。

编辑:

docker ps -a 输出:

ubuntu@ubuntu-HP-Pavilion-15-Notebook-PC:~$ docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS

journalctl -xeu kubelet 输出:

journalctl -xeu kubelet
-- Subject: Unit kubelet.service has finished shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit kubelet.service has finished shutting down.
Sep 01 10:40:05 ubuntu-HP-Pavilion-15-Notebook-PC systemd[1]: Started kubelet: T
-- Subject: Unit kubelet.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit kubelet.service has finished starting up.
-- 
-- The start-up result is done.
Sep 01 10:40:06 ubuntu-HP-Pavilion-15-Notebook-PC kubelet[9107]: Flag --cgroup-d
Sep 01 10:40:06 ubuntu-HP-Pavilion-15-Notebook-PC kubelet[9107]: Flag --cgroup-d
Sep 01 10:40:06 ubuntu-HP-Pavilion-15-Notebook-PC kubelet[9107]: I0901 10:40:06.
Sep 01 10:40:06 ubuntu-HP-Pavilion-15-Notebook-PC kubelet[9107]: I0901 10:40:06.
Sep 01 10:40:06 ubuntu-HP-Pavilion-15-Notebook-PC kubelet[9107]: I0901 10:40:06.
Sep 01 10:40:06 ubuntu-HP-Pavilion-15-Notebook-PC kubelet[9107]: I0901 10:40:06.
Sep 01 10:40:06 ubuntu-HP-Pavilion-15-Notebook-PC kubelet[9107]: F0901 10:40:06.
Sep 01 10:40:06 ubuntu-HP-Pavilion-15-Notebook-PC systemd[1]: kubelet.service: M
Sep 01 10:40:06 ubuntu-HP-Pavilion-15-Notebook-PC systemd[1]: kubelet.service: U
Sep 01 10:40:06 ubuntu-HP-Pavilion-15-Notebook-PC systemd[1]: kubelet.service: F
lines 788-810/810 (END)
-- Subject: Unit kubelet.service has finished shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit kubelet.service has finished shutting down.
Sep 01 10:40:05 ubuntu-HP-Pavilion-15-Notebook-PC systemd[1]: Started kubelet: The Kubernetes Node Agent.
-- Subject: Unit kubelet.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit kubelet.service has finished starting up.
-- 
-- The start-up result is done.
Sep 01 10:40:06 ubuntu-HP-Pavilion-15-Notebook-PC kubelet[9107]: Flag --cgroup-driver has been deprecated, This parameter should be set via the
Sep 01 10:40:06 ubuntu-HP-Pavilion-15-Notebook-PC kubelet[9107]: Flag --cgroup-driver has been deprecated, This parameter should be set via the
Sep 01 10:40:06 ubuntu-HP-Pavilion-15-Notebook-PC kubelet[9107]: I0901 10:40:06.117131    9107 server.go:408] Version: v1.11.2
Sep 01 10:40:06 ubuntu-HP-Pavilion-15-Notebook-PC kubelet[9107]: I0901 10:40:06.117406    9107 plugins.go:97] No cloud provider specified.
Sep 01 10:40:06 ubuntu-HP-Pavilion-15-Notebook-PC kubelet[9107]: I0901 10:40:06.121192    9107 certificate_store.go:131] Loading cert/key pair 
Sep 01 10:40:06 ubuntu-HP-Pavilion-15-Notebook-PC kubelet[9107]: I0901 10:40:06.145720    9107 server.go:648] --cgroups-per-qos enabled, but --
Sep 01 10:40:06 ubuntu-HP-Pavilion-15-Notebook-PC kubelet[9107]: F0901 10:40:06.146074    9107 server.go:262] failed to run Kubelet: Running wi
Sep 01 10:40:06 ubuntu-HP-Pavilion-15-Notebook-PC systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a
Sep 01 10:40:06 ubuntu-HP-Pavilion-15-Notebook-PC systemd[1]: kubelet.service: Unit entered failed state.
Sep 01 10:40:06 ubuntu-HP-Pavilion-15-Notebook-PC systemd[1]: kubelet.service: Failed with result 'exit-code'.
~


          PORTS               NAMES

任何 help/suggestion/comment 将不胜感激。

当您使用 Kubernetes 1.11.2 时,引用 CHANGELOG-1.11.md:

很有用

kubeadm now detects the Docker cgroup driver and starts the kubelet with the matching driver. This eliminates a common error experienced by new users in when the Docker cgroup driver is not the same as the one set for the kubelet due to different Linux distributions setting different cgroup drivers for Docker, making it hard to start the kubelet properly.

在我看来,在您的节点上,一些错误的参数传递给了 kubelet,而这个参数没有启动。

  1. 首先检查你的docker是否有cfgroups命令:

    docker info | grep -i cgroup
    

输出应该是:

Cgroup Driver: cgroupfs

  1. 现在在您的节点上查找 kubelet 服务脚本,可能在 /etc/systemd/system/kubelet.service(或类似名称)中并删除该脚本中有关 cfgroup 的所有参数。

  2. 尝试重启kubelet服务

  3. 再次检查kubelet的日志(journalctl -xeu kubelet)

错误已由

修复
sudo swapoff -a
sudo sed -i '/ swap / s/^/#/' /etc/fstab

重启机器。

sudo sed -i '/ swap / s/^/#/' /etc/fstab

我花了几天时间来解决同样的问题,这对我也很有效。 不知道为什么它只与 'sudo swapoff -a'.

没有相同的效果

[kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.

而且这些错误日志真的很混乱,因为没有任何线索表明这个问题可能与交换配置有关。

如果您 运行 在 openstack 或任何云上,请确保允许安全组中的端口 TCP |入境| 6443* | Kubernetes API 服务器

在我的例子中,这是错误

即使已被 swapoff -a 关闭,您也不应使用 with swap。

您应该在设置文件中禁用 /etc/fstab

对我来说:

root@kali:~#猫/etc/fstab


UUID=ce70d41a-0ce7-42bb-a318-d89369f93b28 / ext4 errors=remount-ro 0 1

#swap was on /dev/sda5 during installation

UUID=2271021b-2aed-4b49-9757-54e7d42ef33e none swap sw 0 0

/dev/sr0 /media/cdrom0 udf,iso9660 user,noauto 0 0

编辑后:

root@kali:~#猫/etc/fstab

UUID=ce70d41a-0ce7-42bb-a318-d89369f93b28 / ext4 errors=remount-ro 0 1

#swap was on /dev/sda5 during installation

#UUID=2271021b-2aed-4b49-9757-54e7d42ef33e none swap sw 0 0

/dev/sr0 /media/cdrom0 udf,iso9660 user,noauto 0 0

然后.. 重启你的机器

注意:重启后不要在kubeadm init之前使用reset via:

kubeadm reset

我遇到的一模一样problem.Turn换掉就可以解决这个问题。 但是我用不同的方式避免了它,所以我post我的解决方案在这里供参考。

kubelet 中的交换检查有两个阶段。 一个是kubeadm命令行工具,一个是kubelet服务。

所以如果我们不关闭交换,kubeadm 应该显示下面的消息

[ERROR Swap]: running with swap on is not supported. Please disable swap

然后它将中止初始化或加入过程。 通过给 kubeadm 添加参数 "--ignore-preflight-errors=Swap" 可以避免 kubeadm 检查。 这是一个例子:

sudo kubeadm join 10.50.10.198:6443 --token XXXX.XXXXXa     --discovery-token-ca-cert-hash sha256:XX48cb7c381 --ignore-preflight-errors=Swap

但是,如果我们运行这个。 kubelet服务会阻塞我们,就会出现这个线程的问题。 我们可以添加一个配置文件来避免这种情况:

 cd /etc/systemd/system/kubelet.service.d
 touch 20-allow-swap.conf

将这些内容添加到此文件中。

[Service] 
Environment="KUBELET_EXTRA_ARGS=--fail-swap-on=false”

然后

 systemctl daemon-reload
 systemctl restart kubelet

最后别忘了运行 kubeadm join agin,with "--ignore-preflight-errors=Swap"

我最近遇到了类似的问题。问题是 cgroup 驱动程序。 Kubernetes cgroup 驱动程序设置为系统,但 docker 设置为 systemd。所以我创建了 /etc/docker/daemon.json 并在下面添加:

{
    "exec-opts": ["native.cgroupdriver=systemd"]
}

然后

 sudo systemctl daemon-reload
 sudo systemctl restart docker
 sudo systemctl restart kubelet

运行 kubeadm init 或 kubeadm 再次加入。

我在尝试初始化我的 k8s 集群时遇到了同样的问题。在我的例子中,错误源于 Docker 和 Kubelet 具有不一致的 cgroups。

解决,先找到Docker cgroup:

docker info | grep Cgroup

上述命令的结果是这样的:

Cgroup Driver: cgroupfs
Cgroup Version: 1

然后,更新 /etc/systemd/system/kubelet.service.d/10-kubeadm.conf 中的 kubelet args (KUBELET_KUBECONFIG_ARGS) 并添加对应于 docker cgroup 的 --cgroup-driver 标志(在本例中为 cgroupfs) .

我的配置文件修改后是这样的:

...
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/e`tc/kubernetes/kubelet.conf --cgroup-driver=cgroupfs"
...

最后,运行 kubeadm reset 然后 kubeadm init

我不得不修改 /usr/lib/systemd/system/docker.service 中的 ExecStart 选项,如此处的一个回复所示:

而且非常重要:我不得不删除我为解决这个问题而创建的文件 /etc/docker/daemon.json。否则,在修改 /usr/lib/systemd/system/docker.service 后会出现错误,因为相同的选项也在 daemon.json

创建 /etc/docker/daemon.json

并在下面添加代码并重新启动 docker

{
  "exec-opts": ["native.cgroupdriver=systemd"]
}