无法在带有 Docker 驱动程序的 Minikube 上使用 GPU
Cannot use GPU on Minikube with Docker driver
目标:
我正在尝试在使用默认 Docker 驱动程序的 Minikube 集群上使用 Nvidia GPU 功能。
问题:
我可以在默认 docker
上下文中使用 nvidia-docker
,但是当切换到 minikube docker-env
时,我收到以下错误:
$ docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
ERRO[0000] error waiting for container: context canceled
环境:
- Ubuntu 18.04
- Minikube v1.10.0
- Docker版本:
$ docker version
Client: Docker Engine - Community
Version: 19.03.10
API version: 1.40
Go version: go1.13.10
Git commit: 9424aeaee9
Built: Thu May 28 22:16:49 2020
OS/Arch: linux/amd64
Experimental: false
Server:
Engine:
Version: 19.03.2
API version: 1.40 (minimum version 1.12)
Go version: go1.12.9
Git commit: 6a30dfca03
Built: Wed Sep 11 22:45:55 2019
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: v1.3.3-14-g449e9269
GitCommit: 449e926990f8539fd00844b26c07e2f1e306c760
runc:
Version: 1.0.0-rc10
GitCommit:
docker-init:
Version: 0.18.0
GitCommit:
- Nvidia 容器运行时版本:
$ nvidia-container-runtime --version
runc version 1.0.0-rc10
commit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
spec: 1.0.1-dev
附加信息:
集群创建于:
minikube start --cpus 3 --memory 8G
当前启用了以下 minikube
个插件:
$ minikube addons list
|-----------------------------|----------|--------------|
| ADDON NAME | PROFILE | STATUS |
|-----------------------------|----------|--------------|
| dashboard | minikube | disabled |
| default-storageclass | minikube | enabled ✅ |
| efk | minikube | disabled |
| freshpod | minikube | disabled |
| gvisor | minikube | disabled |
| helm-tiller | minikube | disabled |
| ingress | minikube | disabled |
| ingress-dns | minikube | disabled |
| istio | minikube | disabled |
| istio-provisioner | minikube | disabled |
| logviewer | minikube | disabled |
| metallb | minikube | disabled |
| metrics-server | minikube | disabled |
| nvidia-driver-installer | minikube | enabled ✅ |
| nvidia-gpu-device-plugin | minikube | enabled ✅ |
| registry | minikube | disabled |
| registry-aliases | minikube | disabled |
| registry-creds | minikube | disabled |
| storage-provisioner | minikube | enabled ✅ |
| storage-provisioner-gluster | minikube | disabled |
|-----------------------------|----------|--------------|
这是 minikube
上下文之外的工作示例:
$ docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
Fri Jun 5 09:23:49 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.59 Driver Version: 440.59 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 106... Off | 00000000:01:00.0 On | N/A |
| 0% 51C P8 6W / 120W | 1293MiB / 6077MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
这是一个社区维基答案。如果需要,请随意编辑和扩展它。
Nvidia GPU 不受 Minikube 的 docker 驱动程序的正式支持。这给您留下了两个可能的选择:
尝试使用NVIDIA Container Toolkit and NVIDIA device plugin。这是一种解决方法,可能不是您用例中的最佳解决方案。
使用KVM2 driver or None driver。这两个得到官方支持和记录。
希望对您有所帮助。
目标:
我正在尝试在使用默认 Docker 驱动程序的 Minikube 集群上使用 Nvidia GPU 功能。
问题:
我可以在默认 docker
上下文中使用 nvidia-docker
,但是当切换到 minikube docker-env
时,我收到以下错误:
$ docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
ERRO[0000] error waiting for container: context canceled
环境:
- Ubuntu 18.04
- Minikube v1.10.0
- Docker版本:
$ docker version
Client: Docker Engine - Community
Version: 19.03.10
API version: 1.40
Go version: go1.13.10
Git commit: 9424aeaee9
Built: Thu May 28 22:16:49 2020
OS/Arch: linux/amd64
Experimental: false
Server:
Engine:
Version: 19.03.2
API version: 1.40 (minimum version 1.12)
Go version: go1.12.9
Git commit: 6a30dfca03
Built: Wed Sep 11 22:45:55 2019
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: v1.3.3-14-g449e9269
GitCommit: 449e926990f8539fd00844b26c07e2f1e306c760
runc:
Version: 1.0.0-rc10
GitCommit:
docker-init:
Version: 0.18.0
GitCommit:
- Nvidia 容器运行时版本:
$ nvidia-container-runtime --version
runc version 1.0.0-rc10
commit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
spec: 1.0.1-dev
附加信息:
集群创建于:
minikube start --cpus 3 --memory 8G
当前启用了以下 minikube
个插件:
$ minikube addons list
|-----------------------------|----------|--------------|
| ADDON NAME | PROFILE | STATUS |
|-----------------------------|----------|--------------|
| dashboard | minikube | disabled |
| default-storageclass | minikube | enabled ✅ |
| efk | minikube | disabled |
| freshpod | minikube | disabled |
| gvisor | minikube | disabled |
| helm-tiller | minikube | disabled |
| ingress | minikube | disabled |
| ingress-dns | minikube | disabled |
| istio | minikube | disabled |
| istio-provisioner | minikube | disabled |
| logviewer | minikube | disabled |
| metallb | minikube | disabled |
| metrics-server | minikube | disabled |
| nvidia-driver-installer | minikube | enabled ✅ |
| nvidia-gpu-device-plugin | minikube | enabled ✅ |
| registry | minikube | disabled |
| registry-aliases | minikube | disabled |
| registry-creds | minikube | disabled |
| storage-provisioner | minikube | enabled ✅ |
| storage-provisioner-gluster | minikube | disabled |
|-----------------------------|----------|--------------|
这是 minikube
上下文之外的工作示例:
$ docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
Fri Jun 5 09:23:49 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.59 Driver Version: 440.59 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 106... Off | 00000000:01:00.0 On | N/A |
| 0% 51C P8 6W / 120W | 1293MiB / 6077MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
这是一个社区维基答案。如果需要,请随意编辑和扩展它。
Nvidia GPU 不受 Minikube 的 docker 驱动程序的正式支持。这给您留下了两个可能的选择:
尝试使用NVIDIA Container Toolkit and NVIDIA device plugin。这是一种解决方法,可能不是您用例中的最佳解决方案。
使用KVM2 driver or None driver。这两个得到官方支持和记录。
希望对您有所帮助。