Kubeadm - 无法加入节点 - 请求在等待连接时被取消
Kubeadm - unable to join nodes - request canceled while waiting for connection
尝试使用 kubeadm 在 3 个 Debian 10 VM 上配置 k8s 集群。
所有虚拟机都有 2 个网络接口,eth0 作为 public 静态 ip 接口,eth1 作为本地接口,静态 ips 在 192.168.0.0/16:
- 大师:192.168.1.1
- 节点 1:192.168.2.1
- 节点 2:192.168.2.2
所有节点之间都有互连。
ip a
来自主控主机:
# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:52:70:53:d5:12 brd ff:ff:ff:ff:ff:ff
inet XXX.XXX.244.240/24 brd XXX.XXX.244.255 scope global dynamic eth0
valid_lft 257951sec preferred_lft 257951sec
inet6 2a01:367:c1f2::112/48 scope global
valid_lft forever preferred_lft forever
inet6 fe80::252:70ff:fe53:d512/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:95:af:b0:8c:c4 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.1/16 brd 192.168.255.255 scope global eth1
valid_lft forever preferred_lft forever
inet6 fe80::295:afff:feb0:8cc4/64 scope link
valid_lft forever preferred_lft forever
主节点初始化良好:
kubeadm init --upload-certs --apiserver-advertise-address=192.168.1.1 --apiserver-cert-extra-sans=192.168.1.1,XXX.XXX.244.240 --pod-network-cidr=10.40.0.0/16 -v=5
但是当我加入工作节点时,kube-api 无法访问:
kubeadm join 192.168.1.1:6443 --token 7bl0in.s6o5kyqg27utklcl --discovery-token-ca-cert-hash sha256:7829b6c7580c0c0f66aa378c9f7e12433eb2d3b67858dd3900f7174ec99cda0e -v=5
来自主人的 Netstat:
# netstat -tupn | grep :6443
tcp 0 0 192.168.1.1:43332 192.168.1.1:6443 TIME_WAIT -
tcp 0 0 192.168.1.1:41774 192.168.1.1:6443 ESTABLISHED 5362/kube-proxy
tcp 0 0 192.168.1.1:41744 192.168.1.1:6443 ESTABLISHED 5236/kubelet
tcp 0 0 192.168.1.1:43376 192.168.1.1:6443 TIME_WAIT -
tcp 0 0 192.168.1.1:43398 192.168.1.1:6443 TIME_WAIT -
tcp 0 0 192.168.1.1:41652 192.168.1.1:6443 ESTABLISHED 4914/kube-scheduler
tcp 0 0 192.168.1.1:43448 192.168.1.1:6443 TIME_WAIT -
tcp 0 0 192.168.1.1:43328 192.168.1.1:6443 TIME_WAIT -
tcp 0 0 192.168.1.1:43452 192.168.1.1:6443 TIME_WAIT -
tcp 0 0 192.168.1.1:43386 192.168.1.1:6443 TIME_WAIT -
tcp 0 0 192.168.1.1:43350 192.168.1.1:6443 TIME_WAIT -
tcp 0 0 192.168.1.1:41758 192.168.1.1:6443 ESTABLISHED 5182/kube-controlle
tcp 0 0 192.168.1.1:43306 192.168.1.1:6443 TIME_WAIT -
tcp 0 0 192.168.1.1:43354 192.168.1.1:6443 TIME_WAIT -
tcp 0 0 192.168.1.1:43296 192.168.1.1:6443 TIME_WAIT -
tcp 0 0 192.168.1.1:43408 192.168.1.1:6443 TIME_WAIT -
tcp 0 0 192.168.1.1:41730 192.168.1.1:6443 ESTABLISHED 5182/kube-controlle
tcp 0 0 192.168.1.1:41738 192.168.1.1:6443 ESTABLISHED 4914/kube-scheduler
tcp 0 0 192.168.1.1:43444 192.168.1.1:6443 TIME_WAIT -
tcp6 0 0 192.168.1.1:6443 192.168.1.1:41730 ESTABLISHED 5094/kube-apiserver
tcp6 0 0 192.168.1.1:6443 192.168.1.1:41744 ESTABLISHED 5094/kube-apiserver
tcp6 0 0 192.168.1.1:6443 192.168.1.1:41738 ESTABLISHED 5094/kube-apiserver
tcp6 0 0 192.168.1.1:6443 192.168.1.1:41652 ESTABLISHED 5094/kube-apiserver
tcp6 0 0 ::1:6443 ::1:42862 ESTABLISHED 5094/kube-apiserver
tcp6 0 0 192.168.1.1:6443 192.168.1.1:41758 ESTABLISHED 5094/kube-apiserver
tcp6 0 0 ::1:42862 ::1:6443 ESTABLISHED 5094/kube-apiserver
tcp6 0 0 192.168.1.1:6443 192.168.1.1:41774 ESTABLISHED 5094/kube-apiserver
Pods 来自大师:
# kubectl --kubeconfig=/etc/kubernetes/admin.conf get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-558bd4d5db-8qhhl 0/1 Pending 0 12m <none> <none> <none> <none>
coredns-558bd4d5db-9hj7z 0/1 Pending 0 12m <none> <none> <none> <none>
etcd-cloud604486.fastpipe.io 1/1 Running 0 12m 2a01:367:c1f2::112 cloud604486.fastpipe.io <none> <none>
kube-apiserver-cloud604486.fastpipe.io 1/1 Running 0 12m 2a01:367:c1f2::112 cloud604486.fastpipe.io <none> <none>
kube-controller-manager-cloud604486.fastpipe.io 1/1 Running 0 12m 2a01:367:c1f2::112 cloud604486.fastpipe.io <none> <none>
kube-proxy-dzd42 1/1 Running 0 12m 2a01:367:c1f2::112 cloud604486.fastpipe.io <none> <none>
kube-scheduler-cloud604486.fastpipe.io 1/1 Running 0 12m 2a01:367:c1f2::112 cloud604486.fastpipe.io <none> <none>
所有虚拟机都设置了这个内核参数:
{ name: 'vm.swappiness', value: '0' }
{ name: 'net.bridge.bridge-nf-call-iptables', value: '1' }
{ name: 'net.bridge.bridge-nf-call-ip6tables', value: '1'}
{ name: 'net.ipv4.ip_forward', value: 1 }
{ name: 'net.ipv6.conf.all.forwarding', value: 1}
br_netfilter 内核模块处于活动状态并且 iptables 设置为旧模式(通过替代方案)
我是不是漏掉了什么?
您遇到问题的原因是必须确保组件之间的 TLS 连接安全。从 kubelet
的角度来看,如果 Api-server
证书将在替代名称中包含我们要连接的服务器的 IP,这将是安全的。您可以自己注意到,您只添加到 SANs
一个 IP 地址。
如何解决这个问题?有两种方式:
在您的节点中将 --discovery-token-unsafe-skip-ca-verification
标志与您的 kubeadm 加入命令一起使用。
在集群初始化阶段(kubeadm init)将第二个NIC
的IP地址添加到SANs
api证书
如需更多阅读,请查看此直接相关的 PR #93264,它在 kubernetes 1.19 中引入。
经过 1 周的修补,问题归结为服务提供商网络配置错误。
对于遇到同样问题的任何人,请检查您网络的 MTU,在我的情况下,它默认为 1500 而不是推荐的 1450。
尝试使用 kubeadm 在 3 个 Debian 10 VM 上配置 k8s 集群。
所有虚拟机都有 2 个网络接口,eth0 作为 public 静态 ip 接口,eth1 作为本地接口,静态 ips 在 192.168.0.0/16:
- 大师:192.168.1.1
- 节点 1:192.168.2.1
- 节点 2:192.168.2.2
所有节点之间都有互连。
ip a
来自主控主机:
# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:52:70:53:d5:12 brd ff:ff:ff:ff:ff:ff
inet XXX.XXX.244.240/24 brd XXX.XXX.244.255 scope global dynamic eth0
valid_lft 257951sec preferred_lft 257951sec
inet6 2a01:367:c1f2::112/48 scope global
valid_lft forever preferred_lft forever
inet6 fe80::252:70ff:fe53:d512/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:95:af:b0:8c:c4 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.1/16 brd 192.168.255.255 scope global eth1
valid_lft forever preferred_lft forever
inet6 fe80::295:afff:feb0:8cc4/64 scope link
valid_lft forever preferred_lft forever
主节点初始化良好:
kubeadm init --upload-certs --apiserver-advertise-address=192.168.1.1 --apiserver-cert-extra-sans=192.168.1.1,XXX.XXX.244.240 --pod-network-cidr=10.40.0.0/16 -v=5
但是当我加入工作节点时,kube-api 无法访问:
kubeadm join 192.168.1.1:6443 --token 7bl0in.s6o5kyqg27utklcl --discovery-token-ca-cert-hash sha256:7829b6c7580c0c0f66aa378c9f7e12433eb2d3b67858dd3900f7174ec99cda0e -v=5
来自主人的 Netstat:
# netstat -tupn | grep :6443
tcp 0 0 192.168.1.1:43332 192.168.1.1:6443 TIME_WAIT -
tcp 0 0 192.168.1.1:41774 192.168.1.1:6443 ESTABLISHED 5362/kube-proxy
tcp 0 0 192.168.1.1:41744 192.168.1.1:6443 ESTABLISHED 5236/kubelet
tcp 0 0 192.168.1.1:43376 192.168.1.1:6443 TIME_WAIT -
tcp 0 0 192.168.1.1:43398 192.168.1.1:6443 TIME_WAIT -
tcp 0 0 192.168.1.1:41652 192.168.1.1:6443 ESTABLISHED 4914/kube-scheduler
tcp 0 0 192.168.1.1:43448 192.168.1.1:6443 TIME_WAIT -
tcp 0 0 192.168.1.1:43328 192.168.1.1:6443 TIME_WAIT -
tcp 0 0 192.168.1.1:43452 192.168.1.1:6443 TIME_WAIT -
tcp 0 0 192.168.1.1:43386 192.168.1.1:6443 TIME_WAIT -
tcp 0 0 192.168.1.1:43350 192.168.1.1:6443 TIME_WAIT -
tcp 0 0 192.168.1.1:41758 192.168.1.1:6443 ESTABLISHED 5182/kube-controlle
tcp 0 0 192.168.1.1:43306 192.168.1.1:6443 TIME_WAIT -
tcp 0 0 192.168.1.1:43354 192.168.1.1:6443 TIME_WAIT -
tcp 0 0 192.168.1.1:43296 192.168.1.1:6443 TIME_WAIT -
tcp 0 0 192.168.1.1:43408 192.168.1.1:6443 TIME_WAIT -
tcp 0 0 192.168.1.1:41730 192.168.1.1:6443 ESTABLISHED 5182/kube-controlle
tcp 0 0 192.168.1.1:41738 192.168.1.1:6443 ESTABLISHED 4914/kube-scheduler
tcp 0 0 192.168.1.1:43444 192.168.1.1:6443 TIME_WAIT -
tcp6 0 0 192.168.1.1:6443 192.168.1.1:41730 ESTABLISHED 5094/kube-apiserver
tcp6 0 0 192.168.1.1:6443 192.168.1.1:41744 ESTABLISHED 5094/kube-apiserver
tcp6 0 0 192.168.1.1:6443 192.168.1.1:41738 ESTABLISHED 5094/kube-apiserver
tcp6 0 0 192.168.1.1:6443 192.168.1.1:41652 ESTABLISHED 5094/kube-apiserver
tcp6 0 0 ::1:6443 ::1:42862 ESTABLISHED 5094/kube-apiserver
tcp6 0 0 192.168.1.1:6443 192.168.1.1:41758 ESTABLISHED 5094/kube-apiserver
tcp6 0 0 ::1:42862 ::1:6443 ESTABLISHED 5094/kube-apiserver
tcp6 0 0 192.168.1.1:6443 192.168.1.1:41774 ESTABLISHED 5094/kube-apiserver
Pods 来自大师:
# kubectl --kubeconfig=/etc/kubernetes/admin.conf get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-558bd4d5db-8qhhl 0/1 Pending 0 12m <none> <none> <none> <none>
coredns-558bd4d5db-9hj7z 0/1 Pending 0 12m <none> <none> <none> <none>
etcd-cloud604486.fastpipe.io 1/1 Running 0 12m 2a01:367:c1f2::112 cloud604486.fastpipe.io <none> <none>
kube-apiserver-cloud604486.fastpipe.io 1/1 Running 0 12m 2a01:367:c1f2::112 cloud604486.fastpipe.io <none> <none>
kube-controller-manager-cloud604486.fastpipe.io 1/1 Running 0 12m 2a01:367:c1f2::112 cloud604486.fastpipe.io <none> <none>
kube-proxy-dzd42 1/1 Running 0 12m 2a01:367:c1f2::112 cloud604486.fastpipe.io <none> <none>
kube-scheduler-cloud604486.fastpipe.io 1/1 Running 0 12m 2a01:367:c1f2::112 cloud604486.fastpipe.io <none> <none>
所有虚拟机都设置了这个内核参数:
{ name: 'vm.swappiness', value: '0' }
{ name: 'net.bridge.bridge-nf-call-iptables', value: '1' }
{ name: 'net.bridge.bridge-nf-call-ip6tables', value: '1'}
{ name: 'net.ipv4.ip_forward', value: 1 }
{ name: 'net.ipv6.conf.all.forwarding', value: 1}
br_netfilter 内核模块处于活动状态并且 iptables 设置为旧模式(通过替代方案)
我是不是漏掉了什么?
您遇到问题的原因是必须确保组件之间的 TLS 连接安全。从 kubelet
的角度来看,如果 Api-server
证书将在替代名称中包含我们要连接的服务器的 IP,这将是安全的。您可以自己注意到,您只添加到 SANs
一个 IP 地址。
如何解决这个问题?有两种方式:
在您的节点中将
--discovery-token-unsafe-skip-ca-verification
标志与您的 kubeadm 加入命令一起使用。在集群初始化阶段(kubeadm init)将第二个
NIC
的IP地址添加到SANs
api证书
如需更多阅读,请查看此直接相关的 PR #93264,它在 kubernetes 1.19 中引入。
经过 1 周的修补,问题归结为服务提供商网络配置错误。
对于遇到同样问题的任何人,请检查您网络的 MTU,在我的情况下,它默认为 1500 而不是推荐的 1450。