Cilium pods 由于节点的无效范围 ip 而崩溃
Cilium pods crashing due to invalid range ip of node
我正在使用 kubespray 部署 kubernetes 集群。
我将网络插件从 calico 更改为 cilium.
不幸的是,一些纤毛 pods 卡在了 CrashLoopBackOff 中。
kubectl --namespace kube-system get pods --selector k8s-app=cilium --sort-by='.status.containerStatuses[0].restartCount' -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cilium-2gmwm 1/1 Running 0 14m 10.10.3.102 nodemaster1 <none> <none>
cilium-9ccdp 1/1 Running 0 14m 10.10.3.110 node6 <none> <none>
cilium-c9nh6 1/1 Running 0 14m 10.10.3.107 node3 <none> <none>
cilium-r9w4z 0/1 CrashLoopBackOff 6 14m 10.10.3.109 node5 <none> <none>
cilium-f8z2q 1/1 Running 0 14m 10.10.3.105 node1 <none> <none>
cilium-d96cd 0/1 CrashLoopBackOff 7 14m 10.10.3.106 node2 <none> <none>
cilium-jgmcf 0/1 CrashLoopBackOff 7 14m 10.10.3.103 nodemaster2 <none> <none>
cilium-9zqnr 0/1 CrashLoopBackOff 7 14m 10.10.3.108 node4 <none> <none>
cilium-llt9p 0/1 CrashLoopBackOff 7 14m 10.10.3.104 nodemaster3 <none> <none>
检查崩溃的日志时 pods 我可以看到这条致命错误消息:
level=fatal msg="The allocation CIDR is different from the previous cilium instance. This error is most likely caused by a temporary network disruption to the kube-apiserver that prevent Cilium from retrieve the node's IPv4/IPv6 allocation range. If you believe the allocation range is supposed to be different you need to clean up all Cilium state with the `cilium cleanup` command on this node. Be aware this will cause network disruption for all existing containers managed by Cilium running on this node and you will have to restart them." error="Unable to allocate internal IPv4 node IP 10.233.71.1: provided IP is not in the valid range. The range of valid IPs is 10.233.70.0/24." subsys=daemon
似乎节点的 IP(10.233.71.1 在这种情况下)不符合 10.233.70.0/24[=34= 的有效范围].
我试图修改kubespray的main.yaml文件来改变子网,但我的多次尝试只是让崩溃的次数上下波动...
例如这个 运行 我试过 :
kube_service_addresses: 10.233.0.0/17
kube_pods_subnet: 10.233.128.0/17
kube_network_node_prefix: 18
如您所见,它不起作用。
如果您有任何想法...:-)
谢谢
在 Cilium 开发人员的帮助下,我终于解决了这个问题!
您必须在 kubespray 文件 kubespray/roles/network_plugin/cilium/templates/cilium-config.yml.j2
[中将密钥 clean-cilium-state
从 false 设置为 true =17=]
部署后您必须还原此布尔值。为此,执行 kubectl edit configmap cilium-config -n kube-system
并将密钥 clean-cilium-state
从 true 更改回 false.
最后你必须杀死纤毛pods。
列出 pods : kubectl get pods -n kube-system
杀死 pods : kubectl delete pods cilium-xxx cilium-xxx ...
这现在在 Cilium 存储库中列为 issue
我正在使用 kubespray 部署 kubernetes 集群。 我将网络插件从 calico 更改为 cilium.
不幸的是,一些纤毛 pods 卡在了 CrashLoopBackOff 中。
kubectl --namespace kube-system get pods --selector k8s-app=cilium --sort-by='.status.containerStatuses[0].restartCount' -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cilium-2gmwm 1/1 Running 0 14m 10.10.3.102 nodemaster1 <none> <none>
cilium-9ccdp 1/1 Running 0 14m 10.10.3.110 node6 <none> <none>
cilium-c9nh6 1/1 Running 0 14m 10.10.3.107 node3 <none> <none>
cilium-r9w4z 0/1 CrashLoopBackOff 6 14m 10.10.3.109 node5 <none> <none>
cilium-f8z2q 1/1 Running 0 14m 10.10.3.105 node1 <none> <none>
cilium-d96cd 0/1 CrashLoopBackOff 7 14m 10.10.3.106 node2 <none> <none>
cilium-jgmcf 0/1 CrashLoopBackOff 7 14m 10.10.3.103 nodemaster2 <none> <none>
cilium-9zqnr 0/1 CrashLoopBackOff 7 14m 10.10.3.108 node4 <none> <none>
cilium-llt9p 0/1 CrashLoopBackOff 7 14m 10.10.3.104 nodemaster3 <none> <none>
检查崩溃的日志时 pods 我可以看到这条致命错误消息:
level=fatal msg="The allocation CIDR is different from the previous cilium instance. This error is most likely caused by a temporary network disruption to the kube-apiserver that prevent Cilium from retrieve the node's IPv4/IPv6 allocation range. If you believe the allocation range is supposed to be different you need to clean up all Cilium state with the `cilium cleanup` command on this node. Be aware this will cause network disruption for all existing containers managed by Cilium running on this node and you will have to restart them." error="Unable to allocate internal IPv4 node IP 10.233.71.1: provided IP is not in the valid range. The range of valid IPs is 10.233.70.0/24." subsys=daemon
似乎节点的 IP(10.233.71.1 在这种情况下)不符合 10.233.70.0/24[=34= 的有效范围].
我试图修改kubespray的main.yaml文件来改变子网,但我的多次尝试只是让崩溃的次数上下波动...
例如这个 运行 我试过 :
kube_service_addresses: 10.233.0.0/17
kube_pods_subnet: 10.233.128.0/17
kube_network_node_prefix: 18
如您所见,它不起作用。
如果您有任何想法...:-)
谢谢
在 Cilium 开发人员的帮助下,我终于解决了这个问题!
您必须在 kubespray 文件 kubespray/roles/network_plugin/cilium/templates/cilium-config.yml.j2
[中将密钥 clean-cilium-state
从 false 设置为 true =17=]
部署后您必须还原此布尔值。为此,执行 kubectl edit configmap cilium-config -n kube-system
并将密钥 clean-cilium-state
从 true 更改回 false.
最后你必须杀死纤毛pods。
列出 pods : kubectl get pods -n kube-system
杀死 pods : kubectl delete pods cilium-xxx cilium-xxx ...
这现在在 Cilium 存储库中列为 issue