在 GKE 集群中安装 Velero 时无法拉取映像 "velero/velero-plugin-for-gcp:v1.1.0"
Failed to pull image "velero/velero-plugin-for-gcp:v1.1.0" while installing Velero in GKE Cluster
我正在尝试为 kubernetes 备份安装和配置 Velero。我已经按照 link 在我的 GKE 集群中配置它。安装顺利,但 velero 无法正常工作。
我正在使用 google 云 shell 来执行 运行 我的所有命令(我已经在我的 google 云 shell 中安装并配置了 velero 客户端)
进一步检查 velero 部署和 velero pods,我发现它无法从 docker 存储库中提取图像。
kubectl get pods -n velero
NAME READY STATUS RESTARTS AGE
velero-5489b955f6-kqb7z 0/1 Init:ErrImagePull 0 20s
来自 velero pod (kubectl describe pod) 的错误(为了便于阅读而对输出进行了编辑 - 下面仅显示了相关信息)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 38s default-scheduler Successfully assigned velero/velero-5489b955f6-kqb7z to gke-gke-cluster1-default-pool-a354fba3-8674
Warning Failed 22s kubelet, gke-gke-cluster1-default-pool-a354fba3-8674 Failed to pull image "velero/velero-plugin-for-gcp:v1.1.0": rpc error: code = Unknown desc = Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning Failed 22s kubelet, gke-gke-cluster1-default-pool-a354fba3-8674 Error: ErrImagePull
Normal BackOff 21s kubelet, gke-gke-cluster1-default-pool-a354fba3-8674 Back-off pulling image "velero/velero-plugin-for-gcp:v1.1.0"
Warning Failed 21s kubelet, gke-gke-cluster1-default-pool-a354fba3-8674 Error: ImagePullBackOff
Normal Pulling 8s (x2 over 37s) kubelet, gke-gke-cluster1-default-pool-a354fba3-8674 Pulling image "velero/velero-plugin-for-gcp:v1.1.0"
用于安装 velero 的命令:(一些值作为变量给出)
velero install \
--provider gcp \
--plugins velero/velero-plugin-for-gcp:v1.1.0 \
--bucket $storagebucket \
--secret-file ~/velero-backup-storage-sa-key.json
Velero 版本
velero version
Client:
Version: v1.4.2
Git commit: 56a08a4d695d893f0863f697c2f926e27d70c0c5
<error getting server version: timed out waiting for server status request to be processed>
GKE 版本
v1.15.12-gke.2
Isn't this a Private Cluster ? – mario 31 mins ago
@mario this is a private cluster but I can deploy other services without any issues (for eg: I have deployed nginx successfully) –
Sreesan 15 mins ago
嗯,这是一个 know limitation of GKE Private Clusters. As you can read in the documentation:
Can't pull image from public Docker Hub
Symptoms
A Pod running in your cluster displays a warning in kubectl describe
such as Failed to pull image: rpc error: code = Unknown desc = Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Potential causes
Nodes in a private cluster do not have outbound access to the public
internet. They have limited access to Google APIs and services,
including Container Registry.
Resolution
You cannot fetch images directly from Docker Hub. Instead, use images
hosted on Container Registry. Note that while Container Registry's
Docker Hub
mirror
is accessible from a private cluster, it should not be exclusively
relied upon. The mirror is only a cache, so images are periodically
removed, and a private cluster is not able to fall back to Docker Hub.
您也可以将其与答案进行比较。
您可以通过简单的实验轻松验证。尝试 运行 两种不同的 nginx 部署。第一个基于图像 nginx
(等于 nginx:latest
),第二个基于 nginx:1.14.2
.
虽然第一种情况完全可行,因为 nginx:latest
镜像可以从 Container Registry 的 Docker Hub 镜像 中提取,可以从私有集群访问,任何拉动 nginx:1.14.2
的尝试都会失败,您将在 Pod
事件中看到。发生这种情况是因为 kubelet 无法在 GCR 中找到此版本的图像,它会尝试从 public docker 注册表 (https://registry-1.docker.io/v2/
),这在 Private Clusters 中是不可能的。 “镜像只是一个缓存,所以图像会定期删除,私有集群无法回退到 Docker Hub。” - 正如您在文档中所读.
如果您仍然有疑问,只需 ssh
进入您的节点并尝试 运行 以下命令:
curl https://cloud.google.com/container-registry/
curl https://registry-1.docker.io/v2/
虽然第一个完美运行,但第二个最终会失败:
curl: (7) Failed to connect to registry-1.docker.io port 443: Connection timed out
原因? - “私有集群中的节点无法出站访问 public 互联网。”
解决方案?
您可以搜索 GCR here.
当前可用的内容
在许多情况下,如果您不指定确切的版本(默认使用 latest
标签),您应该能够获得所需的图像。虽然它可以帮助 nginx
,但不幸的是 Google Container Registry 的 Docker Hub 镜像中目前没有可用的 velero/velero-plugin-for-gcp 版本。
Granting private nodes outbound internet access by using Cloud NAT 似乎是唯一可以应用于您的情况的合理解决方案。
我正在尝试为 kubernetes 备份安装和配置 Velero。我已经按照 link 在我的 GKE 集群中配置它。安装顺利,但 velero 无法正常工作。
我正在使用 google 云 shell 来执行 运行 我的所有命令(我已经在我的 google 云 shell 中安装并配置了 velero 客户端)
进一步检查 velero 部署和 velero pods,我发现它无法从 docker 存储库中提取图像。
kubectl get pods -n velero
NAME READY STATUS RESTARTS AGE
velero-5489b955f6-kqb7z 0/1 Init:ErrImagePull 0 20s
来自 velero pod (kubectl describe pod) 的错误(为了便于阅读而对输出进行了编辑 - 下面仅显示了相关信息)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 38s default-scheduler Successfully assigned velero/velero-5489b955f6-kqb7z to gke-gke-cluster1-default-pool-a354fba3-8674
Warning Failed 22s kubelet, gke-gke-cluster1-default-pool-a354fba3-8674 Failed to pull image "velero/velero-plugin-for-gcp:v1.1.0": rpc error: code = Unknown desc = Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning Failed 22s kubelet, gke-gke-cluster1-default-pool-a354fba3-8674 Error: ErrImagePull
Normal BackOff 21s kubelet, gke-gke-cluster1-default-pool-a354fba3-8674 Back-off pulling image "velero/velero-plugin-for-gcp:v1.1.0"
Warning Failed 21s kubelet, gke-gke-cluster1-default-pool-a354fba3-8674 Error: ImagePullBackOff
Normal Pulling 8s (x2 over 37s) kubelet, gke-gke-cluster1-default-pool-a354fba3-8674 Pulling image "velero/velero-plugin-for-gcp:v1.1.0"
用于安装 velero 的命令:(一些值作为变量给出)
velero install \
--provider gcp \
--plugins velero/velero-plugin-for-gcp:v1.1.0 \
--bucket $storagebucket \
--secret-file ~/velero-backup-storage-sa-key.json
Velero 版本
velero version
Client:
Version: v1.4.2
Git commit: 56a08a4d695d893f0863f697c2f926e27d70c0c5
<error getting server version: timed out waiting for server status request to be processed>
GKE 版本
v1.15.12-gke.2
Isn't this a Private Cluster ? – mario 31 mins ago
@mario this is a private cluster but I can deploy other services without any issues (for eg: I have deployed nginx successfully) – Sreesan 15 mins ago
嗯,这是一个 know limitation of GKE Private Clusters. As you can read in the documentation:
Can't pull image from public Docker Hub
Symptoms
A Pod running in your cluster displays a warning in
kubectl describe
such asFailed to pull image: rpc error: code = Unknown desc = Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Potential causes
Nodes in a private cluster do not have outbound access to the public internet. They have limited access to Google APIs and services, including Container Registry.
Resolution
You cannot fetch images directly from Docker Hub. Instead, use images hosted on Container Registry. Note that while Container Registry's Docker Hub mirror is accessible from a private cluster, it should not be exclusively relied upon. The mirror is only a cache, so images are periodically removed, and a private cluster is not able to fall back to Docker Hub.
您也可以将其与
您可以通过简单的实验轻松验证。尝试 运行 两种不同的 nginx 部署。第一个基于图像 nginx
(等于 nginx:latest
),第二个基于 nginx:1.14.2
.
虽然第一种情况完全可行,因为 nginx:latest
镜像可以从 Container Registry 的 Docker Hub 镜像 中提取,可以从私有集群访问,任何拉动 nginx:1.14.2
的尝试都会失败,您将在 Pod
事件中看到。发生这种情况是因为 kubelet 无法在 GCR 中找到此版本的图像,它会尝试从 public docker 注册表 (https://registry-1.docker.io/v2/
),这在 Private Clusters 中是不可能的。 “镜像只是一个缓存,所以图像会定期删除,私有集群无法回退到 Docker Hub。” - 正如您在文档中所读.
如果您仍然有疑问,只需 ssh
进入您的节点并尝试 运行 以下命令:
curl https://cloud.google.com/container-registry/
curl https://registry-1.docker.io/v2/
虽然第一个完美运行,但第二个最终会失败:
curl: (7) Failed to connect to registry-1.docker.io port 443: Connection timed out
原因? - “私有集群中的节点无法出站访问 public 互联网。”
解决方案?
您可以搜索 GCR here.
当前可用的内容在许多情况下,如果您不指定确切的版本(默认使用 latest
标签),您应该能够获得所需的图像。虽然它可以帮助 nginx
,但不幸的是 Google Container Registry 的 Docker Hub 镜像中目前没有可用的 velero/velero-plugin-for-gcp 版本。
Granting private nodes outbound internet access by using Cloud NAT 似乎是唯一可以应用于您的情况的合理解决方案。