执行和初始化 kubeadm 时出错

Error while executing and initializing kubeadm

初始化 kubeadm 时出现以下错误。在执行 kubadm init 之前,我还尝试过命令 kubeadm reset。 Kubelet 也是 运行,我使用的相同命令是 systemctl enable kubelet && systemctl start kubelet。以下是执行 kubeadm init

后的日志

[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[init] Using Kubernetes version: v1.8.2
[init] Using Authorization modes: [Node RBAC]
[preflight] Running pre-flight checks
[preflight] WARNING: Connection to "https://192.168.78.48:6443" uses proxy "http://user:pwd@192.168.78.15:3128/". If that is not intended, adjust your proxy settings
[preflight] WARNING: Running with swap on is not supported. Please disable swap or set kubelet's --fail-swap-on flag to false.
[preflight] Starting the kubelet service
[kubeadm] WARNING: starting in 1.8, tokens expire after 24 hours by default (if you require a non-expiring token use --token-ttl 0)
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [steller.india.com kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.140.48]
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated sa key and public key.
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "scheduler.conf"
[controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests"
[init] This often takes around a minute; or longer if the control plane images have to be pulled.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.
 

以下是journalctl -u kubelet

的输出

-- Logs begin at Thu 2017-11-02 16:20:50 IST, end at Fri 2017-11-03 17:11:12 IST. --
Nov 03 16:36:48 steller.india.com systemd[1]: Started kubelet: The Kubernetes Node Agent.
Nov 03 16:36:48 steller.india.com systemd[1]: Starting kubelet: The Kubernetes Node Agent...
Nov 03 16:36:48 steller.india.com kubelet[52511]: I1103 16:36:48.998467   52511 feature_gate.go:156] feature gates: map[]
Nov 03 16:36:48 steller.india.com kubelet[52511]: I1103 16:36:48.998532   52511 controller.go:114] kubelet config controller: starting controller
Nov 03 16:36:48 steller.india.com kubelet[52511]: I1103 16:36:48.998536   52511 controller.go:118] kubelet config controller: validating combination of defaults and flag
Nov 03 16:36:49 steller.india.com kubelet[52511]: I1103 16:36:49.837248   52511 client.go:75] Connecting to docker on unix:///var/run/docker.sock
Nov 03 16:36:49 steller.india.com kubelet[52511]: I1103 16:36:49.837282   52511 client.go:95] Start docker client with request timeout=2m0s
Nov 03 16:36:49 steller.india.com kubelet[52511]: W1103 16:36:49.839719   52511 cni.go:196] Unable to update cni config: No networks found in /etc/cni/net.d
Nov 03 16:36:49 steller.india.com systemd[1]: kubelet.service: main process exited, code=exited, status=1/FAILURE
Nov 03 16:36:49 steller.india.com kubelet[52511]: I1103 16:36:49.846959   52511 feature_gate.go:156] feature gates: map[]
Nov 03 16:36:49 steller.india.com kubelet[52511]: W1103 16:36:49.847216   52511 server.go:289] --cloud-provider=auto-detect is deprecated. The desired cloud provider sho
Nov 03 16:36:49 steller.india.com kubelet[52511]: error: failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such
Nov 03 16:36:49 steller.india.com systemd[1]: Unit kubelet.service entered failed state.
Nov 03 16:36:49 steller.india.com systemd[1]: kubelet.service failed.
Nov 03 16:37:00 steller.india.com systemd[1]: kubelet.service holdoff time over, scheduling restart.
Nov 03 16:37:00 steller.india.com systemd[1]: Started kubelet: The Kubernetes Node Agent.
Nov 03 16:37:00 steller.india.com systemd[1]: Starting kubelet: The Kubernetes Node Agent...
Nov 03 16:37:00 steller.india.com kubelet[52975]: I1103 16:37:00.134702   52975 feature_gate.go:156] feature gates: map[]
Nov 03 16:37:00 steller.india.com kubelet[52975]: I1103 16:37:00.134763   52975 controller.go:114] kubelet config controller: starting controller
Nov 03 16:37:00 steller.india.com kubelet[52975]: I1103 16:37:00.134767   52975 controller.go:118] kubelet config controller: validating combination of defaults and flag
Nov 03 16:37:00 steller.india.com kubelet[52975]: I1103 16:37:00.141273   52975 client.go:75] Connecting to docker on unix:///var/run/docker.sock
Nov 03 16:37:00 steller.india.com kubelet[52975]: I1103 16:37:00.141364   52975 client.go:95] Start docker client with request timeout=2m0s
Nov 03 16:37:00 steller.india.com kubelet[52975]: W1103 16:37:00.143023   52975 cni.go:196] Unable to update cni config: No networks found in /etc/cni/net.d
Nov 03 16:37:00 steller.india.com kubelet[52975]: I1103 16:37:00.149537   52975 feature_gate.go:156] feature gates: map[]
Nov 03 16:37:00 steller.india.com kubelet[52975]: W1103 16:37:00.149780   52975 server.go:289] --cloud-provider=auto-detect is deprecated. The desired cloud provider sho
Nov 03 16:37:00 steller.india.com kubelet[52975]: I1103 16:37:00.179873   52975 certificate_manager.go:361] Requesting new certificate.
Nov 03 16:37:00 steller.india.com kubelet[52975]: E1103 16:37:00.180392   52975 certificate_manager.go:284] Failed while requesting a signed certificate from the master:
Nov 03 16:37:00 steller.india.com kubelet[52975]: I1103 16:37:00.181404   52975 manager.go:149] cAdvisor running in container: "/sys/fs/cgroup/cpu,cpuacct/system.slice/k
Nov 03 16:37:00 steller.india.com kubelet[52975]: W1103 16:37:00.223876   52975 manager.go:157] unable to connect to Rkt api service: rkt: cannot tcp Dial rkt api servic
Nov 03 16:37:00 steller.india.com kubelet[52975]: W1103 16:37:00.224005   52975 manager.go:166] unable to connect to CRI-O api service: Get http://%2Fvar%2Frun%2Fcrio.so
Nov 03 16:37:00 steller.india.com kubelet[52975]: I1103 16:37:00.262573   52975 fs.go:139] Filesystem UUIDs: map[17856e0b-777f-4065-ac97-fb75d7a1e197:/dev/dm-1 2dc6a878-
Nov 03 16:37:00 steller.india.com kubelet[52975]: I1103 16:37:00.262604   52975 fs.go:140] Filesystem partitions: map[/dev/sdb:{mountpoint:/D major:8 minor:16 fsType:xfs
Nov 03 16:37:00 steller.india.com kubelet[52975]: I1103 16:37:00.268969   52975 manager.go:216] Machine: {NumCores:56 CpuFrequency:2600000 MemoryCapacity:540743667712 Hu
Nov 03 16:37:00 steller.india.com kubelet[52975]: 967295 Mtu:1500} {Name:eno49 MacAddress:14:02:ec:82:57:30 Speed:10000 Mtu:1500} {Name:eno50 MacAddress:14:02:ec:82:57:3
Nov 03 16:37:00 steller.india.com kubelet[52975]: evel:1} {Size:262144 Type:Unified Level:2}]} {Id:13 Threads:[12 40] Caches:[{Size:32768 Type:Data Level:1} {Size:32768
Nov 03 16:37:00 steller.india.com kubelet[52975]: s:[26 54] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:
Nov 03 16:37:00 steller.india.com kubelet[52975]: I1103 16:37:00.270145   52975 manager.go:222] Version: {KernelVersion:3.10.0-229.14.1.el7.x86_64 ContainerOsVersion:Cen
Nov 03 16:37:00 steller.india.com kubelet[52975]: I1103 16:37:00.271263   52975 server.go:422] --cgroups-per-qos enabled, but --cgroup-root was not specified.  defaultin
Nov 03 16:37:00 steller.india.com kubelet[52975]: error: failed to run Kubelet: Running with swap on is not supported, please disable swap! or set --fail-swap-on flag to
Nov 03 16:37:00 steller.india.com systemd[1]: kubelet.service: main process exited, code=exited, status=1/FAILURE
Nov 03 16:37:00 steller.india.com systemd[1]: Unit kubelet.service entered failed state.
Nov 03 16:37:00 steller.india.com systemd[1]: kubelet.service failed.

看起来你在你的服务器上启用了交换,你能禁用它并重新运行 init 命令吗?

error: failed to run Kubelet: Running with swap on is not supported, please disable swap! or set --fail-swap-on flag to

以下是在centos中禁用swap的步骤https://www.centos.org/docs/5/html/Deployment_Guide-en-US/s1-swap-removing.html

只需禁用机器中的交换。 sudo swapoff -a

sudo swapoff -a 在重启后不持久。

您可以在重启后禁用交换,只需注释掉(在行前添加#)/etc/fstab 文件中的交换条目。它将防止交换分区在重启后自动挂载。

步骤:

  1. 打开文件/etc/fstab

  2. 下面找一行

    # swap 在安装过程中 /dev/sda5

    UUID=e7bf7d6e-e90d-4a96-91db-7d5f282f9363 none swap sw 0 0

  3. 用#注释掉上面一行并保存。它应该如下所示

    # swap 在安装过程中 /dev/sda5

    #UUID=e7bf7d6e-e90d-4a96-91db-7d5f282f9363 none swap sw 0 0

  4. 重启系统或为当前会话执行"sudo swapoff -a"以避免重启。

一些 Kubernetes 部署需要交换

一般来说,official requirement就是禁用swap。

原因? kubernetes 还不支持。 IMO 他们要求您禁用交换以防止 多节点集群 工作负载转移的问题。

但是,我有一个有效的用例 - 我正在开发一个本地产品,linux 发行版,包含在 kubeadm 中。没有设计水平缩放。为了在机会性内存高峰中幸存下来并仍然运行(但速度很慢),我绝对需要交换


要在启用交换的情况下安装 kubeadm

  1. /etc/systemd/system/kubelet.service.d/20-allow-swap.conf 中创建一个文件,内容为:

    [Service]
    Environment="KUBELET_EXTRA_ARGS=--fail-swap-on=false"
    
  2. 运行

    sudo systemctl daemon-reload
    
  3. 用标志 --ignore-preflight-errors=Swap

    初始化 kubeadm
    kubeadm init --ignore-preflight-errors=Swap
    

我猜不同的用户会出于不同的原因遇到这个问题,所以我会在这里添加额外的解释。

最快和最安全的解决方案是禁用交换-就像@Jossef_Harush分享的link中提到的,这也是K8S推荐的。
但正如他还提到的那样 - 一些工作负载需要更深入的内存管理。

如果您遇到上述错误,因为交换是有目的的 - 在考虑启用交换之前,我建议阅读有关 Evicting end-user Pods 的内容:

If the kubelet is unable to reclaim sufficient resource on the node, kubelet begins evicting Pods.

The kubelet ranks Pods for eviction first by whether or not their usage of the starved resource exceeds requests, then by Priority, and then by the consumption of the starved compute resource relative to the Pods' scheduling requests.

As a result, kubelet ranks and evicts Pods in the following order:

  • BestEffort or Burstable Pods whose usage of a starved resource exceeds its request. Such pods are ranked by Priority, and then usage above request.

  • Guaranteed pods and Burstable pods whose usage is beneath requests are evicted last. Guaranteed Pods are guaranteed only when requests and limits are specified for all the containers and they are equal. Such pods are guaranteed to never be evicted because of another Pod's resource consumption. If a system daemon (such as kubelet, docker, and journald) is consuming more resources than were reserved via system-reserved or kube-reserved allocations, and the node only has Guaranteed or Burstable Pods using less than requests remaining, then the node must choose to evict such a Pod in order to preserve node stability and to limit the impact of the unexpected consumption to other Pods. In this case, it will choose to evict pods of Lowest Priority first.

确保您也熟悉:

1 ) The 3 qos classes - 确保您的高优先级工作负载是 运行 有保证的(或至少是可突发的)class。

2 ) Pod Priority and Preemption.

在 Ubuntu 上禁用 SWAP 并使其持久化:

sudo swapoff -a && sudo sed -i '/swap/d' /etc/fstab

第二个命令将通过注释掉文件中的 SWAP 行来永久禁用 SWAP /etc/fstab