kubelet 无法获取 docker 和 kubelet 服务的 cgroup 统计信息
kubelet fails to get cgroup stats for docker and kubelet services
我 运行 kubernetes 在裸机上 Debian (3 位大师,2 位工人,目前是 PoC) .我遵循 k8s-the-hard-way,我 运行 在我的 kubelet 上遇到以下问题:
Failed to get system container stats for
"/system.slice/docker.service": failed to get cgroup stats for
"/system.slice/docker.service": failed to get cgroup stats for
"/system.slice/docker.service": failed to get container info for
"/system.slice/docker.service": unknown container
"/system.slice/docker.service"
我对 kubelet.service 也有同样的消息。
我有一些关于那些 cgroup 的文件:
$ ls /sys/fs/cgroup/systemd/system.slice/docker.service
cgroup.clone_children cgroup.procs notify_on_release tasks
$ ls /sys/fs/cgroup/systemd/system.slice/kubelet.service/
cgroup.clone_children cgroup.procs notify_on_release tasks
并且 cadvisor 告诉我:
$ curl http://127.0.0.1:4194/validate
cAdvisor version:
OS version: Debian GNU/Linux 8 (jessie)
Kernel version: [Supported and recommended]
Kernel version is 3.16.0-4-amd64. Versions >= 2.6 are supported. 3.0+ are recommended.
Cgroup setup: [Supported and recommended]
Available cgroups: map[cpu:1 memory:1 freezer:1 net_prio:1 cpuset:1 cpuacct:1 devices:1 net_cls:1 blkio:1 perf_event:1]
Following cgroups are required: [cpu cpuacct]
Following other cgroups are recommended: [memory blkio cpuset devices freezer]
Hierarchical memory accounting enabled. Reported memory usage includes memory used by child containers.
Cgroup mount setup: [Supported and recommended]
Cgroups are mounted at /sys/fs/cgroup.
Cgroup mount directories: blkio cpu cpu,cpuacct cpuacct cpuset devices freezer memory net_cls net_cls,net_prio net_prio perf_event systemd
Any cgroup mount point that is detectible and accessible is supported. /sys/fs/cgroup is recommended as a standard location.
Cgroup mounts:
cgroup /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd 0 0
cgroup /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset 0 0
cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpu,cpuacct 0 0
cgroup /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0
cgroup /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices 0 0
cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0
cgroup /sys/fs/cgroup/net_cls,net_prio cgroup rw,nosuid,nodev,noexec,relatime,net_cls,net_prio 0 0
cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0
cgroup /sys/fs/cgroup/perf_event cgroup rw,nosuid,nodev,noexec,relatime,perf_event 0 0
Managed containers:
/kubepods/burstable/pod76099b4b-af57-11e7-9b82-fa163ea0076a
/kubepods/besteffort/pod6ed4ee49-af53-11e7-9b82-fa163ea0076a/f9da6bf60a186c47bd704bbe3cc18b25d07d4e7034d185341a090dc3519c047a
Namespace: docker
Aliases:
k8s_tiller_tiller-deploy-cffb976df-5s6np_kube-system_6ed4ee49-af53-11e7-9b82-fa163ea0076a_1
f9da6bf60a186c47bd704bbe3cc18b25d07d4e7034d185341a090dc3519c047a
/kubepods/burstable/pod76099b4b-af57-11e7-9b82-fa163ea0076a/956911118c342375abfb7a07ec3bb37451bbc64a1e141321b6284cf5049e385f
编辑
在 kubelet (--cadvisor-port=0
) 上禁用 cadvisor 端口并不能解决这个问题。
尝试用
启动kubelet
--runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice
我在带有 Kubelet 1.8.0 和 Docker 1.12
的 RHEL7 上使用这个解决方案
除此更改外,我还必须执行 yum update
才能使其生效。可能对尝试此解决方法的其他人有所帮助。
angeloxx 的解决方法也适用于 kops 的 AWS 默认映像 (k8s-1.8-debian-jessie-amd64-hvm-ebs-2017-12-02 (ami-bd229ec4))
sudo vim /etc/sysconfig/kubelet
在 DAEMON_ARGS 字符串的末尾添加:
--runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice
最后:
sudo systemctl restart kubelet
对于那些更进一步的人,在 kops AMI kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-02-08 如上所述,我不得不添加:
在DAEMON_ARGS字符串末尾添加:
--runtime-cgroups=/lib/systemd/system/kubelet.service --kubelet-cgroups=/lib/systemd/system/kubelet.service
然后:
sudo systemctl restart kubelet
但我发现我仍然得到:
Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"
重新启动 dockerd 解决了这个错误:
sudo systemctl restart docker
谢谢
经过更多的挖掘,我找到了一个更好的解决方案,将其添加到 kops 配置中:
谢谢angeloxx!
我正在遵循 kubernetes 指南:
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/setup-ha-etcd-with-kubeadm/
在说明中,他们让你制作一个文件:
/usr/lib/systemd/system/kubelet.service.d/20-etcd-service-manager.conf
行:
ExecStart=/usr/bin/kubelet --address=127.0.0.1 --pod-manifest-path=/etc/kubernetes/manifests --cgroup-driver=systemd
我接受了你的回答并将其添加到 ExecStart 行的末尾:
ExecStart=/usr/bin/kubelet --address=127.0.0.1 --pod-manifest-path=/etc/kubernetes/manifests --cgroup-driver=systemd --runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice
我写这篇文章是为了防止它对其他人有帮助
@wolmi 感谢编辑!
再补充一点:
我上面的配置是针对我的 etcd 集群的,而不是 kubernetes 节点。节点上的 20-etcd-service-manager.conf 之类的文件会覆盖“10-kubeadm.conf”文件中的所有设置,如果错过配置会导致各种情况。对节点 and/or /var/lib/kubelet/kubeadm-flags.env.
使用“/var/lib/kubelet/config.yaml”文件
我 运行 kubernetes 在裸机上 Debian (3 位大师,2 位工人,目前是 PoC) .我遵循 k8s-the-hard-way,我 运行 在我的 kubelet 上遇到以下问题:
Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"
我对 kubelet.service 也有同样的消息。
我有一些关于那些 cgroup 的文件:
$ ls /sys/fs/cgroup/systemd/system.slice/docker.service
cgroup.clone_children cgroup.procs notify_on_release tasks
$ ls /sys/fs/cgroup/systemd/system.slice/kubelet.service/
cgroup.clone_children cgroup.procs notify_on_release tasks
并且 cadvisor 告诉我:
$ curl http://127.0.0.1:4194/validate
cAdvisor version:
OS version: Debian GNU/Linux 8 (jessie)
Kernel version: [Supported and recommended]
Kernel version is 3.16.0-4-amd64. Versions >= 2.6 are supported. 3.0+ are recommended.
Cgroup setup: [Supported and recommended]
Available cgroups: map[cpu:1 memory:1 freezer:1 net_prio:1 cpuset:1 cpuacct:1 devices:1 net_cls:1 blkio:1 perf_event:1]
Following cgroups are required: [cpu cpuacct]
Following other cgroups are recommended: [memory blkio cpuset devices freezer]
Hierarchical memory accounting enabled. Reported memory usage includes memory used by child containers.
Cgroup mount setup: [Supported and recommended]
Cgroups are mounted at /sys/fs/cgroup.
Cgroup mount directories: blkio cpu cpu,cpuacct cpuacct cpuset devices freezer memory net_cls net_cls,net_prio net_prio perf_event systemd
Any cgroup mount point that is detectible and accessible is supported. /sys/fs/cgroup is recommended as a standard location.
Cgroup mounts:
cgroup /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd 0 0
cgroup /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset 0 0
cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpu,cpuacct 0 0
cgroup /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0
cgroup /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices 0 0
cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0
cgroup /sys/fs/cgroup/net_cls,net_prio cgroup rw,nosuid,nodev,noexec,relatime,net_cls,net_prio 0 0
cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0
cgroup /sys/fs/cgroup/perf_event cgroup rw,nosuid,nodev,noexec,relatime,perf_event 0 0
Managed containers:
/kubepods/burstable/pod76099b4b-af57-11e7-9b82-fa163ea0076a
/kubepods/besteffort/pod6ed4ee49-af53-11e7-9b82-fa163ea0076a/f9da6bf60a186c47bd704bbe3cc18b25d07d4e7034d185341a090dc3519c047a
Namespace: docker
Aliases:
k8s_tiller_tiller-deploy-cffb976df-5s6np_kube-system_6ed4ee49-af53-11e7-9b82-fa163ea0076a_1
f9da6bf60a186c47bd704bbe3cc18b25d07d4e7034d185341a090dc3519c047a
/kubepods/burstable/pod76099b4b-af57-11e7-9b82-fa163ea0076a/956911118c342375abfb7a07ec3bb37451bbc64a1e141321b6284cf5049e385f
编辑
在 kubelet (--cadvisor-port=0
) 上禁用 cadvisor 端口并不能解决这个问题。
尝试用
启动kubelet--runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice
我在带有 Kubelet 1.8.0 和 Docker 1.12
的 RHEL7 上使用这个解决方案除此更改外,我还必须执行 yum update
才能使其生效。可能对尝试此解决方法的其他人有所帮助。
angeloxx 的解决方法也适用于 kops 的 AWS 默认映像 (k8s-1.8-debian-jessie-amd64-hvm-ebs-2017-12-02 (ami-bd229ec4))
sudo vim /etc/sysconfig/kubelet
在 DAEMON_ARGS 字符串的末尾添加:
--runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice
最后:
sudo systemctl restart kubelet
对于那些更进一步的人,在 kops AMI kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-02-08 如上所述,我不得不添加:
在DAEMON_ARGS字符串末尾添加:
--runtime-cgroups=/lib/systemd/system/kubelet.service --kubelet-cgroups=/lib/systemd/system/kubelet.service
然后:
sudo systemctl restart kubelet
但我发现我仍然得到:
Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"
重新启动 dockerd 解决了这个错误:
sudo systemctl restart docker
谢谢
经过更多的挖掘,我找到了一个更好的解决方案,将其添加到 kops 配置中:
谢谢angeloxx!
我正在遵循 kubernetes 指南: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/setup-ha-etcd-with-kubeadm/
在说明中,他们让你制作一个文件: /usr/lib/systemd/system/kubelet.service.d/20-etcd-service-manager.conf
行:
ExecStart=/usr/bin/kubelet --address=127.0.0.1 --pod-manifest-path=/etc/kubernetes/manifests --cgroup-driver=systemd
我接受了你的回答并将其添加到 ExecStart 行的末尾:
ExecStart=/usr/bin/kubelet --address=127.0.0.1 --pod-manifest-path=/etc/kubernetes/manifests --cgroup-driver=systemd --runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice
我写这篇文章是为了防止它对其他人有帮助
@wolmi 感谢编辑!
再补充一点: 我上面的配置是针对我的 etcd 集群的,而不是 kubernetes 节点。节点上的 20-etcd-service-manager.conf 之类的文件会覆盖“10-kubeadm.conf”文件中的所有设置,如果错过配置会导致各种情况。对节点 and/or /var/lib/kubelet/kubeadm-flags.env.
使用“/var/lib/kubelet/config.yaml”文件