AWS EKS 节点创建失败

AWS EKS nodes creation failure

我在 AWS 中有一个由 these 指令创建的集群。

然后我尝试根据this文档在这个集群中添加节点。

似乎无法创建节点 vpc-cnicoredns 健康问题类型:insufficientNumberOfReplicas The add-on is unhealthy because it doesn't have the desired number of replicas.

podskubectl get pods -n kube-system的状态:

NAME                       READY   STATUS             RESTARTS   AGE
aws-node-9cwkd             0/1     CrashLoopBackOff   13         42m
aws-node-h4qjt             0/1     CrashLoopBackOff   13         42m
aws-node-jrn5x             0/1     CrashLoopBackOff   13         43m
coredns-745979c988-25fcc   0/1     Pending            0          120m
coredns-745979c988-qvh7h   0/1     Pending            0          120m
kube-proxy-2bmlq           1/1     Running            0          42m
kube-proxy-hjcrw           1/1     Running            0          43m
kube-proxy-j9r9n           1/1     Running            0          42m

aws-node-9cwkd 个 pod 的日志:

{"level":"info","ts":"2021-11-30T14:11:14.156Z","caller":"entrypoint.sh","msg":"Validating env variables ..."}
{"level":"info","ts":"2021-11-30T14:11:14.157Z","caller":"entrypoint.sh","msg":"Install CNI binaries.."}
{"level":"info","ts":"2021-11-30T14:11:14.177Z","caller":"entrypoint.sh","msg":"Starting IPAM daemon in the background ... "}
{"level":"info","ts":"2021-11-30T14:11:14.179Z","caller":"entrypoint.sh","msg":"Checking for IPAM connectivity ... "}
{"level":"info","ts":"2021-11-30T14:11:16.189Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2021-11-30T14:11:18.198Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2021-11-30T14:11:20.205Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2021-11-30T14:11:22.215Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2021-11-30T14:11:24.226Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}

通过运行命令kubectl describe pod aws-node-h4qjt -n kube-system出现以下错误:

Readiness probe failed: {"level":"info","ts":"2021-11-30T14:11:07.145Z","caller":"/usr/local/go/src/runtime/proc.go:225","msg":"timeout: failed to connect service \":50051\" within 5s"}

任何帮助都将不胜感激,以便在集群中成功创建节点。

很可能是节点服务角色的问题。如果您执行到 pod,然后查看 ipamd.log

,您可以获得更多信息
kubectl exec -it aws-node-9cwkd -n kube-system -- /bin/bash 
cat /host/var/log/aws-routed-eni/ipamd.log

这是我遇到相同错误时的错误示例

{"level":"error","ts":"2021-12-02T13:27:51.464Z","caller":"ipamd/ipamd.go:444","msg":"Failed to call ec2:DescribeNetworkInterfaces for [eni-0c01bd25ae6999ed5]: UnauthorizedOperation: You are not authorized to perform this operation.\n\tstatus code: 403, request id: 0438b84b-8052-4f31-9d63-c2ff7512f131"}

在我的例子中,我必须将 AmazonEKS_CNI_Policy 策略添加到节点 IAM 角色。

https://docs.aws.amazon.com/eks/latest/userguide/cni-iam-role.html

我使用带有 --nodes 标志的 eksctl 命令行工具,一切都按预期成功创建。

eksctl create cluster --name cluster-name \
  --nodes 3 \
  --node-type=t3.large \
  --region=eu-west-1