初始化守护进程时出错” error="exit status 2" subsys=daemon in cilium CNI on kubernetes

Error while initializing daemon" error="exit status 2" subsys=daemon in cilium CNI on kubernetes

在 kubernetes 集群中,我们使用了 cilium CNI,但它在工作节点上失败了。错误信息如下所示。

$ kubectl get pod -n kube-system
NAME                               READY   STATUS    RESTARTS   AGE
cilium-dpd4k                       0/1     Running   2          97s
cilium-operator-55658fb5c4-qpdqb   1/1     Running   0          6m30s
cilium-sc7x6                       1/1     Running   0          6m30s
coredns-6955765f44-2tjf8           1/1     Running   0          6m31s
coredns-6955765f44-h96c4           1/1     Running   0          6m31s
etcd-store                         1/1     Running   0          6m26s
kube-apiserver-store               1/1     Running   0          6m26s
kube-controller-manager-store      1/1     Running   0          6m26s
kube-proxy-8xz8n                   1/1     Running   0          97s
kube-proxy-gxgfv                   1/1     Running   0          6m30s
kube-scheduler-store               1/1     Running   0          6m26s

.

$ kubectl logs -f cilium-dpd4k -n kube-system
level=info msg="Skipped reading configuration file" reason="Config File \"ciliumd\" Not Found in \"[/root]\"" subsys=daemon
level=info msg="  --access-log=''" subsys=daemon
level=info msg="  --agent-labels=''" subsys=daemon
level=info msg="  --allow-localhost='auto'" subsys=daemon
level=info msg="  --annotate-k8s-node='true'" subsys=daemon
level=info msg="  --auto-create-cilium-node-resource='true'" subsys=daemon
level=info msg="  --auto-direct-node-routes='false'" subsys=daemon
level=info msg="  --blacklist-conflicting-routes='true'" subsys=daemon
level=info msg="  --bpf-compile-debug='false'" subsys=daemon
level=info msg="  --bpf-ct-global-any-max='262144'" subsys=daemon
level=info msg="  --bpf-ct-global-tcp-max='524288'" subsys=daemon
level=info msg="  --bpf-ct-timeout-regular-any='1m0s'" subsys=daemon
level=info msg="  --bpf-ct-timeout-regular-tcp='6h0m0s'" subsys=daemon
level=info msg="  --bpf-ct-timeout-regular-tcp-fin='10s'" subsys=daemon
.
.
.
.

level=warning msg="+ ip -6 rule del fwmark 0xA00/0xF00 pref 10 lookup 2005" subsys=daemon
level=warning msg="+ true" subsys=daemon
level=warning msg="+ sed -i /ENCAP_GENEVE/d /var/run/cilium/state/globals/node_config.h" subsys=daemon
level=warning msg="+ sed -i /ENCAP_VXLAN/d /var/run/cilium/state/globals/node_config.h" subsys=daemon
level=warning msg="+ '[' vxlan = vxlan ']'" subsys=daemon
level=warning msg="+ echo '#define ENCAP_VXLAN 1'" subsys=daemon
level=warning msg="+ '[' vxlan = vxlan -o vxlan = geneve ']'" subsys=daemon
level=warning msg="+ ENCAP_DEV=cilium_vxlan" subsys=daemon
level=warning msg="+ ip link show cilium_vxlan" subsys=daemon
level=warning msg="37450: cilium_vxlan: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000" subsys=daemon
level=warning msg="    link/ether 7e:53:e8:db:1d:ef brd ff:ff:ff:ff:ff:ff" subsys=daemon
level=warning msg="+ setup_dev cilium_vxlan" subsys=daemon
level=warning msg="+ local -r NAME=cilium_vxlan" subsys=daemon
level=warning msg="+ ip link set cilium_vxlan up" subsys=daemon
level=warning msg="RTNETLINK answers: Address already in use" subsys=daemon
level=error msg="Error while initializing daemon" error="exit status 2" subsys=daemon
level=fatal msg="Error while creating daemon" error="exit status 2" subsys=daemon

集群信息:

$ uname -a
Linux STORE 4.9.0-9-amd64 #1 SMP Debian 4.9.168-1+deb9u5 (2019-08-11) x86_64 GNU/Linux

$ kubeadm version 
kubeadm version: &version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.2", GitCommit:"59603c6e503c87169aea6106f57b9f242f64df89", GitTreeState:"clean", BuildDate:"2020-01-18T23:27:49Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}

$ ip address
60: weave: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue state UP group default qlen 1000
    link/ether 7e:18:01:f4:7d:8b brd ff:ff:ff:ff:ff:ff
    inet 10.36.0.0/12 brd 10.47.255.255 scope global weave
       valid_lft forever preferred_lft forever
    inet6 fe80::7c18:1ff:fef4:7d8b/64 scope link 
       valid_lft forever preferred_lft forever
61: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 5a:24:ec:73:cd:7f brd ff:ff:ff:ff:ff:ff
63: vethwe-datapath@vethwe-bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master datapath state UP group default 
    link/ether 66:2d:12:49:83:7a brd ff:ff:ff:ff:ff:ff
    inet6 fe80::642d:12ff:fe49:837a/64 scope link 
       valid_lft forever preferred_lft forever
64: vethwe-bridge@vethwe-datapath: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether 5e:9e:39:10:31:1a brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5c9e:39ff:fe10:311a/64 scope link 
       valid_lft forever preferred_lft forever
65: vxlan-6784: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65485 qdisc noqueue master datapath state UNKNOWN group default qlen 1000
    link/ether ae:63:29:ac:de:fd brd ff:ff:ff:ff:ff:ff
    inet6 fe80::ac63:29ff:feac:defd/64 scope link 
       valid_lft forever preferred_lft forever
37365: vethe0a862c@if37364: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge state UP group default 
    link/ether 1e:70:54:f4:ad:a0 brd ff:ff:ff:ff:ff:ff link-netnsid 3
    inet6 fe80::1c70:54ff:fef4:ada0/64 scope link 
       valid_lft forever preferred_lft forever
37369: veth6628311@if37368: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge state UP group default 
    link/ether 1a:ed:12:31:a2:31 brd ff:ff:ff:ff:ff:ff link-netnsid 4
    inet6 fe80::18ed:12ff:fe31:a231/64 scope link 
       valid_lft forever preferred_lft forever
37372: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1
    link/ipip 0.0.0.0 brd 0.0.0.0

我已经搜索并找到了这个 issue。但我无法解决这个问题。解决这个问题的步骤是什么? 他是什么意思:

Since we no longer delete the old cilium_host/cilium_net veth pair if they already exist, 'ip route add' will complain of existing routes. Fix this by using 'ip route replace' instead.

你做了以下任何一项:

1.You可以按照以下步骤卸载flannel

rm -rf /var/lib/cni/
rm -rf /run/flannel
rm -rf /etc/cni/

移除与flannel相关的接口:

ip link 

对于每个界面绒布,执行以下操作

ifconfig <name of interface from ip link> down
ip link delete <name of interface from ip link>

2.You 集群中可以同时包含 flannel 和 cillium。你需要按照这个 doc 配置 flannel 和 cillium。请注意,这是测试版功能,尚不推荐用于生产用途。