Rapids / docker: 无法 select 具有功能的设备驱动程序“”:[[gpu]]
Rapids / docker: could not select device driver "" with capabilities: [[gpu]]
我是 Rapids 的新手,很少有 conda 的良好体验。所以我正在尝试使用容器化版本。我是 Docker 的新手,未知数的组合让我无法解决问题。
我有一个 Ubuntu 18.04 服务器,
# uname -v
#30~18.04.1-Ubuntu SMP Fri Jan 17 06:14:09 UTC 2020
我在上面安装了 Docker
的新版本
# apt-get install docker docker-ce docker-ce-cli containerd.io
# docker --version
Docker version 19.03.8, build afacb8b7f0
本机安装了cuda v10.2
# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
和Python v3.6.9
# python3 --version
Python 3.6.9
如 NVIDIA Container Toolkit Quickstart 部分所示,我将 nvidia-docker 列表安装到 /etc/apt/sources.list.d/
# curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
# curl -s -L https://nvidia.github.io/nvidia-docker/ubuntu18.04/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
明确用 ubuntu18.04
代替 $distribution,因为那是 Ubuntu equivalent for Linux Mint 19.3.
按照 RAPIDS - Open GPU Data Science 中的启动容器和笔记本服务器说明,我拉取了 0.13-cuda10.2-runtime-ubuntu18.04-py3.6 运行时。
# docker pull rapidsai/rapidsai:0.13-cuda10.2-runtime-ubuntu18.04-py3.6
好久好久,好几GB了,好像都OK了。 (没有警告或错误消息。)此外,该图像似乎已在 Docker.
中注册
# docker images -a
REPOSITORY TAG IMAGE ID CREATED SIZE
rapidsai/rapidsai 0.13-cuda10.2-runtime-ubuntu18.04-py3.6 c7440af853b5 4 days ago 9.26GB
rapidsai/rapidsai cuda10.2-runtime-ubuntu18.04-py3.6 c7440af853b5 4 days ago 9.26GB
但是,我接下来尝试启动笔记本服务器:
# docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 \
rapidsai/rapidsai:cuda10.0-runtime-ubuntu18.04-py3.6
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
这似乎令人惊讶,因为检测到两个 GTX 1080 Ti GPU
# nvidia-smi
Fri May 8 16:41:57 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... On | 00000000:08:00.0 Off | N/A |
| 21% 38C P8 10W / 250W | 1MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... On | 00000000:42:00.0 Off | N/A |
| 23% 42C P8 10W / 250W | 1MiB / 11177MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
清理后
# docker system prune -a
# apt-get purge docker docker-engine docker.io containerd runc
我重新安装了docker并再次拉取了rapidsai镜像。结果不变。
是否与NVIDIA驱动版本:440.33.01有冲突?
有什么建议吗?
感谢您试用 RAPIDS。您碰巧安装了 nvidia-container-toolkit
吗? https://github.com/NVIDIA/nvidia-docker#quickstart. I didn't see that in your steps and missing it could cause that issue. It's in our prerequisites on https://rapids.ai/start.html
来自NVIDIA CUDA/WSL 2 documentation:
Use the Docker installation script to install Docker for your choice of WSL 2 Linux distribution. Note that NVIDIA Container Toolkit does not yet support Docker Desktop WSL 2 backend.
我只是按照中的步骤操作;它工作正常:
要卸载以前的 nvidia-docker 软件包,请发出这些命令:
[user@gpu1 ~]# docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm –f
[user@gpu1 ~]# sudo apt-get remove nvidia-docker
要安装 NVIDIA-GPU Docker 容器工具包,您首先需要添加包存储库:
user@ubuntu-gpu1:~# distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
user@ubuntu-gpu1:~# curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
user@ubuntu-gpu1:~# curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
user@ubuntu-gpu1:~# sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
user@ubuntu-gpu1:~# sudo systemctl restart docker
然后使用最新的官方 CUDA 映像验证 nvidia-smi 安装:
user@ubuntu-gpu1:~# sudo docker run -it --rm --gpus all nvidia/cuda:9.0-base nvidia-smi
试试这个
sudo apt install -y nvidia-docker2
sudo systemctl daemon-reload
sudo systemctl restart docker
我是 Rapids 的新手,很少有 conda 的良好体验。所以我正在尝试使用容器化版本。我是 Docker 的新手,未知数的组合让我无法解决问题。
我有一个 Ubuntu 18.04 服务器,
# uname -v
#30~18.04.1-Ubuntu SMP Fri Jan 17 06:14:09 UTC 2020
我在上面安装了 Docker
的新版本# apt-get install docker docker-ce docker-ce-cli containerd.io
# docker --version
Docker version 19.03.8, build afacb8b7f0
本机安装了cuda v10.2
# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
和Python v3.6.9
# python3 --version
Python 3.6.9
如 NVIDIA Container Toolkit Quickstart 部分所示,我将 nvidia-docker 列表安装到 /etc/apt/sources.list.d/
# curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
# curl -s -L https://nvidia.github.io/nvidia-docker/ubuntu18.04/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
明确用 ubuntu18.04
代替 $distribution,因为那是 Ubuntu equivalent for Linux Mint 19.3.
按照 RAPIDS - Open GPU Data Science 中的启动容器和笔记本服务器说明,我拉取了 0.13-cuda10.2-runtime-ubuntu18.04-py3.6 运行时。
# docker pull rapidsai/rapidsai:0.13-cuda10.2-runtime-ubuntu18.04-py3.6
好久好久,好几GB了,好像都OK了。 (没有警告或错误消息。)此外,该图像似乎已在 Docker.
中注册# docker images -a
REPOSITORY TAG IMAGE ID CREATED SIZE
rapidsai/rapidsai 0.13-cuda10.2-runtime-ubuntu18.04-py3.6 c7440af853b5 4 days ago 9.26GB
rapidsai/rapidsai cuda10.2-runtime-ubuntu18.04-py3.6 c7440af853b5 4 days ago 9.26GB
但是,我接下来尝试启动笔记本服务器:
# docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 \
rapidsai/rapidsai:cuda10.0-runtime-ubuntu18.04-py3.6
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
这似乎令人惊讶,因为检测到两个 GTX 1080 Ti GPU
# nvidia-smi
Fri May 8 16:41:57 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... On | 00000000:08:00.0 Off | N/A |
| 21% 38C P8 10W / 250W | 1MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... On | 00000000:42:00.0 Off | N/A |
| 23% 42C P8 10W / 250W | 1MiB / 11177MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
清理后
# docker system prune -a
# apt-get purge docker docker-engine docker.io containerd runc
我重新安装了docker并再次拉取了rapidsai镜像。结果不变。
是否与NVIDIA驱动版本:440.33.01有冲突?
有什么建议吗?
感谢您试用 RAPIDS。您碰巧安装了 nvidia-container-toolkit
吗? https://github.com/NVIDIA/nvidia-docker#quickstart. I didn't see that in your steps and missing it could cause that issue. It's in our prerequisites on https://rapids.ai/start.html
来自NVIDIA CUDA/WSL 2 documentation:
Use the Docker installation script to install Docker for your choice of WSL 2 Linux distribution. Note that NVIDIA Container Toolkit does not yet support Docker Desktop WSL 2 backend.
我只是按照
要卸载以前的 nvidia-docker 软件包,请发出这些命令:
[user@gpu1 ~]# docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm –f
[user@gpu1 ~]# sudo apt-get remove nvidia-docker
要安装 NVIDIA-GPU Docker 容器工具包,您首先需要添加包存储库:
user@ubuntu-gpu1:~# distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
user@ubuntu-gpu1:~# curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
user@ubuntu-gpu1:~# curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
user@ubuntu-gpu1:~# sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
user@ubuntu-gpu1:~# sudo systemctl restart docker
然后使用最新的官方 CUDA 映像验证 nvidia-smi 安装:
user@ubuntu-gpu1:~# sudo docker run -it --rm --gpus all nvidia/cuda:9.0-base nvidia-smi
试试这个
sudo apt install -y nvidia-docker2
sudo systemctl daemon-reload
sudo systemctl restart docker