EKS Anywhere 集群证书管理器 io 超时

EKS Anywhere Cluster cert-manager io-timeout

第一次尝试 EKS Anywhere docker 供应商部署如下 link https://anywhere.eks.amazonaws.com/docs/getting-started/local-environment/

它卡在 'waiting for cert-manager' 处。在 CentOS 7 上工作。系统在代理后面。

Installing cert-manager Version="v1.5.3+66e1acc"
Using Override="cert-manager.yaml" Provider="cert-manager" Version="v1.5.3+66e1acc"
Waiting for cert-manager to be available...
Error: timed out waiting for the condition

只有 cert-manager pods 无法拉取镜像

   NAMESPACE            NAME                                                              READY   STATUS             RESTARTS   AGE
  cert-manager         cert-manager-7988d4fb6c-bjhsv                                     0/1     ImagePullBackOff   0          5m54s
  cert-manager         cert-manager-cainjector-6bc8dcdb64-hvdx5                          0/1     ImagePullBackOff   0          5m55s
  cert-manager         cert-manager-webhook-68979bfb95-q8ttt                             0/1     ImagePullBackOff   0          5m54s
  kube-system          coredns-745c7986c7-2wrx5                                          1/1     Running            0          5m57s
  kube-system          coredns-745c7986c7-kx594                                          1/1     Running            0          5m57s
  kube-system          etcd-dev-cluster-eks-a-cluster-control-plane                      1/1     Running            0          5m52s
  kube-system          kindnet-4jcvt                                                     1/1     Running            0          5m57s
  kube-system          kube-apiserver-dev-cluster-eks-a-cluster-control-plane            1/1     Running            0          5m52s
  kube-system          kube-controller-manager-dev-cluster-eks-a-cluster-control-plane   1/1     Running            0          5m52s
  kube-system          kube-proxy-4dk2r                                                  1/1     Running            0          5m57s
  kube-system          kube-scheduler-dev-cluster-eks-a-cluster-control-plane            1/1     Running            0          5m52s
  local-path-storage   local-path-provisioner-666bfc797f-nkhqf                           1/1     Running            0          5m57s

正在使用 docker pull

拉取相同的图像
 public.ecr.aws/eks-anywhere/jetstack/cert-manager-webhook      v1.5.3-eks-a-6                 194bcfda671e   3 months ago    46MB
 public.ecr.aws/eks-anywhere/jetstack/cert-manager-controller   v1.5.3-eks-a-6                 1e6749016508   3 months ago    61.3MB
 public.ecr.aws/eks-anywhere/jetstack/cert-manager-cainjector   v1.5.3-eks-a-6                 45723d794a88   3 months ago    42.4MB

kubectl describe 给出以下(i/o 超时)错误以及 'server misbehaving' 错误

 Failed to pull image "public.ecr.aws/eks-anywhere/jetstack/cert-manager-controller:v1.5.3-eks-a-6": rpc error: code = Unknown desc = failed to pull and unpack image "public.ecr.aws/eks-anywhere/jetstack/cert-manager-controller:v1.5.3-eks-a-6": failed to resolve reference "public.ecr.aws/eks-anywhere/jetstack/cert-manager-controller:v1.5.3-eks-a-6": failed to do request: Head "https://public.ecr.aws/v2/eks-anywhere/jetstack/cert-manager-controller/manifests/v1.5.3-eks-a-6": dial tcp: lookup public.ecr.aws on 172.19.0.1:53: read udp 172.19.0.2:38941->172.19.0.1:53: i/o timeout

这是一个与代理相关的问题。通过在节点的 docker 容器的 containerd 服务中添加代理配置并重新启动 containerd 服务来解决。

docker exec -it <container name> bash

容器内

cd /etc/systemd/system/
mkdir containerd.service.d
touch http-proxy.conf
cat <<EOF >/etc/systemd/system/containerd.service.d/http-proxy.conf    
[Service]    
Environment="HTTP_PROXY=http://proxy ip:proxy port"    
Environment="HTTPS_PROXY=http://proxy ip:proxy port"    
Environment="NO_PROXY=${NO_PROXY:-localhost},${LOCAL_NETWORK}"    
EOF
systemctl daemon-reload
systemctl restart containerd