AWS EKS 节点创建失败
AWS EKS nodes creation failure
我在 AWS 中有一个由 these 指令创建的集群。
然后我尝试根据this文档在这个集群中添加节点。
似乎无法创建节点 vpc-cni
和 coredns
健康问题类型:insufficientNumberOfReplicas The add-on is unhealthy because it doesn't have the desired number of replicas.
podskubectl get pods -n kube-system
的状态:
NAME READY STATUS RESTARTS AGE
aws-node-9cwkd 0/1 CrashLoopBackOff 13 42m
aws-node-h4qjt 0/1 CrashLoopBackOff 13 42m
aws-node-jrn5x 0/1 CrashLoopBackOff 13 43m
coredns-745979c988-25fcc 0/1 Pending 0 120m
coredns-745979c988-qvh7h 0/1 Pending 0 120m
kube-proxy-2bmlq 1/1 Running 0 42m
kube-proxy-hjcrw 1/1 Running 0 43m
kube-proxy-j9r9n 1/1 Running 0 42m
aws-node-9cwkd
个 pod 的日志:
{"level":"info","ts":"2021-11-30T14:11:14.156Z","caller":"entrypoint.sh","msg":"Validating env variables ..."}
{"level":"info","ts":"2021-11-30T14:11:14.157Z","caller":"entrypoint.sh","msg":"Install CNI binaries.."}
{"level":"info","ts":"2021-11-30T14:11:14.177Z","caller":"entrypoint.sh","msg":"Starting IPAM daemon in the background ... "}
{"level":"info","ts":"2021-11-30T14:11:14.179Z","caller":"entrypoint.sh","msg":"Checking for IPAM connectivity ... "}
{"level":"info","ts":"2021-11-30T14:11:16.189Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2021-11-30T14:11:18.198Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2021-11-30T14:11:20.205Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2021-11-30T14:11:22.215Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2021-11-30T14:11:24.226Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
通过运行命令kubectl describe pod aws-node-h4qjt -n kube-system
出现以下错误:
Readiness probe failed: {"level":"info","ts":"2021-11-30T14:11:07.145Z","caller":"/usr/local/go/src/runtime/proc.go:225","msg":"timeout: failed to connect service \":50051\" within 5s"}
任何帮助都将不胜感激,以便在集群中成功创建节点。
很可能是节点服务角色的问题。如果您执行到 pod,然后查看 ipamd.log
,您可以获得更多信息
kubectl exec -it aws-node-9cwkd -n kube-system -- /bin/bash
cat /host/var/log/aws-routed-eni/ipamd.log
这是我遇到相同错误时的错误示例
{"level":"error","ts":"2021-12-02T13:27:51.464Z","caller":"ipamd/ipamd.go:444","msg":"Failed
to call ec2:DescribeNetworkInterfaces for [eni-0c01bd25ae6999ed5]:
UnauthorizedOperation: You are not authorized to perform this
operation.\n\tstatus code: 403, request id:
0438b84b-8052-4f31-9d63-c2ff7512f131"}
在我的例子中,我必须将 AmazonEKS_CNI_Policy 策略添加到节点 IAM 角色。
https://docs.aws.amazon.com/eks/latest/userguide/cni-iam-role.html
我使用带有 --nodes
标志的 eksctl 命令行工具,一切都按预期成功创建。
eksctl create cluster --name cluster-name \
--nodes 3 \
--node-type=t3.large \
--region=eu-west-1
我在 AWS 中有一个由 these 指令创建的集群。
然后我尝试根据this文档在这个集群中添加节点。
似乎无法创建节点 vpc-cni
和 coredns
健康问题类型:insufficientNumberOfReplicas The add-on is unhealthy because it doesn't have the desired number of replicas.
podskubectl get pods -n kube-system
的状态:
NAME READY STATUS RESTARTS AGE
aws-node-9cwkd 0/1 CrashLoopBackOff 13 42m
aws-node-h4qjt 0/1 CrashLoopBackOff 13 42m
aws-node-jrn5x 0/1 CrashLoopBackOff 13 43m
coredns-745979c988-25fcc 0/1 Pending 0 120m
coredns-745979c988-qvh7h 0/1 Pending 0 120m
kube-proxy-2bmlq 1/1 Running 0 42m
kube-proxy-hjcrw 1/1 Running 0 43m
kube-proxy-j9r9n 1/1 Running 0 42m
aws-node-9cwkd
个 pod 的日志:
{"level":"info","ts":"2021-11-30T14:11:14.156Z","caller":"entrypoint.sh","msg":"Validating env variables ..."}
{"level":"info","ts":"2021-11-30T14:11:14.157Z","caller":"entrypoint.sh","msg":"Install CNI binaries.."}
{"level":"info","ts":"2021-11-30T14:11:14.177Z","caller":"entrypoint.sh","msg":"Starting IPAM daemon in the background ... "}
{"level":"info","ts":"2021-11-30T14:11:14.179Z","caller":"entrypoint.sh","msg":"Checking for IPAM connectivity ... "}
{"level":"info","ts":"2021-11-30T14:11:16.189Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2021-11-30T14:11:18.198Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2021-11-30T14:11:20.205Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2021-11-30T14:11:22.215Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2021-11-30T14:11:24.226Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
通过运行命令kubectl describe pod aws-node-h4qjt -n kube-system
出现以下错误:
Readiness probe failed: {"level":"info","ts":"2021-11-30T14:11:07.145Z","caller":"/usr/local/go/src/runtime/proc.go:225","msg":"timeout: failed to connect service \":50051\" within 5s"}
任何帮助都将不胜感激,以便在集群中成功创建节点。
很可能是节点服务角色的问题。如果您执行到 pod,然后查看 ipamd.log
,您可以获得更多信息kubectl exec -it aws-node-9cwkd -n kube-system -- /bin/bash
cat /host/var/log/aws-routed-eni/ipamd.log
这是我遇到相同错误时的错误示例
{"level":"error","ts":"2021-12-02T13:27:51.464Z","caller":"ipamd/ipamd.go:444","msg":"Failed to call ec2:DescribeNetworkInterfaces for [eni-0c01bd25ae6999ed5]: UnauthorizedOperation: You are not authorized to perform this operation.\n\tstatus code: 403, request id: 0438b84b-8052-4f31-9d63-c2ff7512f131"}
在我的例子中,我必须将 AmazonEKS_CNI_Policy 策略添加到节点 IAM 角色。
https://docs.aws.amazon.com/eks/latest/userguide/cni-iam-role.html
我使用带有 --nodes
标志的 eksctl 命令行工具,一切都按预期成功创建。
eksctl create cluster --name cluster-name \
--nodes 3 \
--node-type=t3.large \
--region=eu-west-1