Pods 在 AWS ec2 中从 master 创建的 3 节点集群上无法访问（超时）

Question

我在 AWS ec2 (Centos 8 ami) 中有 3 个节点集群。

当我尝试从主节点访问调度在工作节点上的 pods 时：

kubectl exec -it kube-flannel-ds-amd64-lfzpd -n kube-system /bin/bash
Error from server: error dialing backend: dial tcp 10.41.12.53:10250: i/o timeout

kubectl get pods --all-namespaces -o wide
NAMESPACE     NAME                             READY   STATUS    RESTARTS   AGE     IP             NODE             NOMINATED NODE   READINESS GATES
kube-system   coredns-54ff9cd656-8mpbx         1/1     Running   2          7d21h   10.244.0.7     master           <none>           <none>
kube-system   coredns-54ff9cd656-xcxvs         1/1     Running   2          7d21h   10.244.0.6     master           <none>           <none>
kube-system   etcd-master                      1/1     Running   2          7d21h   10.41.14.198   master           <none>           <none>
kube-system   kube-apiserver-master            1/1     Running   2          7d21h   10.41.14.198   master           <none>           <none>
kube-system   kube-controller-manager-master   1/1     Running   2          7d21h   10.41.14.198   master           <none>           <none>
kube-system   kube-flannel-ds-amd64-8zgpw      1/1     Running   2          7d21h   10.41.14.198   master           <none>           <none>
kube-system   kube-flannel-ds-amd64-lfzpd      1/1     Running   2          7d21h   10.41.12.53    worker1          <none>           <none>
kube-system   kube-flannel-ds-amd64-nhw5j      1/1     Running   2          7d21h   10.41.15.9     worker3   <none>           <none>
kube-system   kube-flannel-ds-amd64-s6nms      1/1     Running   2          7d21h   10.41.15.188   worker2          <none>           <none>
kube-system   kube-proxy-47s8k                 1/1     Running   2          7d21h   10.41.15.9     worker3   <none>           <none>
kube-system   kube-proxy-6lbvq                 1/1     Running   2          7d21h   10.41.15.188   worker2          <none>           <none>
kube-system   kube-proxy-vhmfp                 1/1     Running   2          7d21h   10.41.14.198   master           <none>           <none>
kube-system   kube-proxy-xwsnk                 1/1     Running   2          7d21h   10.41.12.53    worker1          <none>           <none>
kube-system   kube-scheduler-master            1/1     Running   2          7d21h   10.41.14.198   master           <none>           <none>

kubectl get nodes
NAME             STATUS   ROLES    AGE     VERSION
master           Ready    master   7d21h   v1.13.10
worker1          Ready    <none>   7d21h   v1.13.10
worker2          Ready    <none>   7d21h   v1.13.10
worker3          Ready    <none>   7d21h   v1.13.10

我在所有节点中尝试了以下步骤，但到目前为止运气不好：

iptables -w -P FORWARD ACCEPT 在所有节点上
开启假面舞会
开启端口 10250/tcp
开启端口 8472/udp
启动kubelet

任何指针都会有所帮助。

Answer 1

Flannel 不支持 NFT，并且由于您使用的是 CentOS 8，因此无法回退到 iptables。
在这种情况下，您最好的选择是切换到 Calico.
您必须更新 Calico DaemonSet：

....
    Environment:
      FELIX_IPTABLESBACKEND: NFT
....

或使用版本 3.12 或更高版本，因为它添加了
自动检测 iptables 后端

Previous versions of Calico required you to specify the host’s iptables backend (one of NFT or Legacy). With this release, Calico can now autodetect the iptables variant on the host by setting the Felix configuration parameter IptablesBackend to Auto. This is useful in scenarios where you don’t know what the iptables backend might be such as in mixed deployments. For more information, see the documentation for iptables dataplane configuration

或切换到 Ubuntu 20.04。 Ubuntu 还没有使用 nftables。

Answer 2

问题是因为 SG.I 中的入站端口在 SG 中添加了这些端口，我能够解决这个问题。

  2222
  24007
  24008
49152-49251

我的原始安装程序脚本不需要执行上述步骤，而运行在 VM 和独立计算机上。由于 SG 特定于 EC2，因此应允许入站端口。这里要注意的是我所有的节点（master 和 worker）都在同一个 SG 上。即使这样端口也必须在入站规则中打开，这就是 SG 的工作方式。

Pods 在 AWS ec2 中从 master 创建的 3 节点集群上无法访问（超时）

Pods not accessible (timeout) on 3 Node cluster created in AWS ec2 from master

kubernetes

amazon-ec2

centos