cos-extensions install gpu 无法在 GCP Compute Engine VM 上下载驱动程序签名
cos-extensions install gpu failed to download driver signature on GCP Compute Engine VM
我正在 GCP Compute Engine 上使用支持 GPU 的虚拟机。
作为OS我使用容器优化版本(COS 89-16108.403.47 LTS),它支持通过运行 'cos-extensions install gpu' 简单的GPU驱动安装SSH(参见 Google doc)。
直到几天前我开始收到一条错误消息说某些驱动程序签名下载失败(请参阅下面的完整错误消息)之前,这一直运行良好,从那以后我就无法让它工作了.
有人可以确认我在这里遇到错误或帮助我解决这个问题吗?
非常感谢!
~ $ cos-extensions install gpu
Unable to find image 'gcr.io/cos-cloud/cos-gpu-installer:v2.0.3' locally
v2.0.3: Pulling from cos-cloud/cos-gpu-installer
419e7ae5bb1e: Pull complete
6f6ec2441524: Pull complete
11d24f918ba9: Pull complete
Digest: sha256:1cf2701dc2c3944a93fd06cb6c9eedfabf323425483ba3af294510621bb37d0e
Status: Downloaded newer image for gcr.io/cos-cloud/cos-gpu-installer:v2.0.3
I0618 06:33:49.227680 1502 main.go:21] Checking if this is the only cos_gpu_installer that is running.
I0618 06:33:49.258483 1502 install.go:74] Running on COS build id 16108.403.47
I0618 06:33:49.258505 1502 installer.go:187] Getting the default GPU driver version
I0618 06:33:49.285265 1502 utils.go:72] Downloading gpu_default_version from https://storage.googleapis.com/cos-
tools/16108.403.47/gpu_default_version
I0618 06:33:49.353149 1502 utils.go:120] Successfully downloaded gpu_default_version from https://storage.google
apis.com/cos-tools/16108.403.47/gpu_default_version
I0618 06:33:49.353381 1502 install.go:85] Installing GPU driver version 450.119.04
I0618 06:33:49.353461 1502 cache.go:69] error: failed to read file /root/var/lib/nvidia/.cache: open /root/var/l
ib/nvidia/.cache: no such file or directory
I0618 06:33:49.353482 1502 install.go:120] Did not find cached version, installing the drivers...
I0618 06:33:49.353491 1502 installer.go:82] Configuring driver installation directories
I0618 06:33:49.421021 1502 installer.go:196] Updating container's ld cache
I0618 06:33:49.526673 1502 signature.go:30] Downloading driver signature for version 450.119.04
I0618 06:33:49.526712 1502 utils.go:72] Downloading 450.119.04.signature.tar.gz from https://storage.googleapis.
com/cos-tools/16108.403.47/extensions/gpu/450.119.04.signature.tar.gz
E0618 06:33:49.657028 1502 artifacts.go:106] Failed to download extensions/gpu/450.119.04.signature.tar.gz from
public GCS: failed to download 450.119.04.signature.tar.gz, status: 404 Not Found
E0618 06:33:49.657487 1502 install.go:175] failed to download driver signature: failed to download driver signat
ure for version 450.119.04: failed to download extensions/gpu/450.119.04.signature.tar.gz
这似乎是一个已知问题,您可以在报告中找到它 here and a similar thread with workarounds here。
貌似COS新版本发布和更新驱动发布之间有延迟。
不过我刚刚运行cos-extensions list
,好像有驱动可用:
$ cos-extensions list
Available extensions for COS version 89-16108.403.47:
[gpu]
450.119.04 [default]
450.80.02
还有签名:
$ wget https://storage.googleapis.com/cos-tools/16108.403.47/extensions/gpu/450.119.04.signature.tar.gz
--2021-06-21 12:49:58-- https://storage.googleapis.com/cos-tools/16108.403.47/extensions/gpu/450.119.04.signature.tar.gz
Resolving storage.googleapis.com... 173.194.198.128, 64.233.191.128, 173.194.74.128, ...
Connecting to storage.googleapis.com|173.194.198.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4588 (4.5K) [application/octet-stream]
Saving to: '450.119.04.signature.tar.gz'
450.119.04.signature.tar.gz 100%[=============================================>] 4.48K --.-KB/s in 0s
2021-06-21 12:49:58 (62.0 MB/s) - '450.119.04.signature.tar.gz' saved [4588/4588]
我正在 GCP Compute Engine 上使用支持 GPU 的虚拟机。
作为OS我使用容器优化版本(COS 89-16108.403.47 LTS),它支持通过运行 'cos-extensions install gpu' 简单的GPU驱动安装SSH(参见 Google doc)。
直到几天前我开始收到一条错误消息说某些驱动程序签名下载失败(请参阅下面的完整错误消息)之前,这一直运行良好,从那以后我就无法让它工作了.
有人可以确认我在这里遇到错误或帮助我解决这个问题吗?
非常感谢!
~ $ cos-extensions install gpu
Unable to find image 'gcr.io/cos-cloud/cos-gpu-installer:v2.0.3' locally
v2.0.3: Pulling from cos-cloud/cos-gpu-installer
419e7ae5bb1e: Pull complete
6f6ec2441524: Pull complete
11d24f918ba9: Pull complete
Digest: sha256:1cf2701dc2c3944a93fd06cb6c9eedfabf323425483ba3af294510621bb37d0e
Status: Downloaded newer image for gcr.io/cos-cloud/cos-gpu-installer:v2.0.3
I0618 06:33:49.227680 1502 main.go:21] Checking if this is the only cos_gpu_installer that is running.
I0618 06:33:49.258483 1502 install.go:74] Running on COS build id 16108.403.47
I0618 06:33:49.258505 1502 installer.go:187] Getting the default GPU driver version
I0618 06:33:49.285265 1502 utils.go:72] Downloading gpu_default_version from https://storage.googleapis.com/cos-
tools/16108.403.47/gpu_default_version
I0618 06:33:49.353149 1502 utils.go:120] Successfully downloaded gpu_default_version from https://storage.google
apis.com/cos-tools/16108.403.47/gpu_default_version
I0618 06:33:49.353381 1502 install.go:85] Installing GPU driver version 450.119.04
I0618 06:33:49.353461 1502 cache.go:69] error: failed to read file /root/var/lib/nvidia/.cache: open /root/var/l
ib/nvidia/.cache: no such file or directory
I0618 06:33:49.353482 1502 install.go:120] Did not find cached version, installing the drivers...
I0618 06:33:49.353491 1502 installer.go:82] Configuring driver installation directories
I0618 06:33:49.421021 1502 installer.go:196] Updating container's ld cache
I0618 06:33:49.526673 1502 signature.go:30] Downloading driver signature for version 450.119.04
I0618 06:33:49.526712 1502 utils.go:72] Downloading 450.119.04.signature.tar.gz from https://storage.googleapis.
com/cos-tools/16108.403.47/extensions/gpu/450.119.04.signature.tar.gz
E0618 06:33:49.657028 1502 artifacts.go:106] Failed to download extensions/gpu/450.119.04.signature.tar.gz from
public GCS: failed to download 450.119.04.signature.tar.gz, status: 404 Not Found
E0618 06:33:49.657487 1502 install.go:175] failed to download driver signature: failed to download driver signat
ure for version 450.119.04: failed to download extensions/gpu/450.119.04.signature.tar.gz
这似乎是一个已知问题,您可以在报告中找到它 here and a similar thread with workarounds here。
貌似COS新版本发布和更新驱动发布之间有延迟。
不过我刚刚运行cos-extensions list
,好像有驱动可用:
$ cos-extensions list
Available extensions for COS version 89-16108.403.47:
[gpu]
450.119.04 [default]
450.80.02
还有签名:
$ wget https://storage.googleapis.com/cos-tools/16108.403.47/extensions/gpu/450.119.04.signature.tar.gz
--2021-06-21 12:49:58-- https://storage.googleapis.com/cos-tools/16108.403.47/extensions/gpu/450.119.04.signature.tar.gz
Resolving storage.googleapis.com... 173.194.198.128, 64.233.191.128, 173.194.74.128, ...
Connecting to storage.googleapis.com|173.194.198.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4588 (4.5K) [application/octet-stream]
Saving to: '450.119.04.signature.tar.gz'
450.119.04.signature.tar.gz 100%[=============================================>] 4.48K --.-KB/s in 0s
2021-06-21 12:49:58 (62.0 MB/s) - '450.119.04.signature.tar.gz' saved [4588/4588]