NVIDIA Docker - initialization error: nvml error: driver not loaded

NVIDIA Docker - initialization error: nvml error: driver not loaded

我是 Docker 的新手,所以下面的问题可能有点天真,但我被卡住了,我需要帮助。

我正在尝试重现一些研究结果。作者只是 released code along with a specification of how to build a Docker image 重现了他们的结果。相关位复制如下:

我相信我 Docker 安装正确:

$ docker --version
Docker version 19.03.13, build 4484c46d9d
$ sudo docker run hello-world

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

但是,当我尝试检查 nvidia-docker 安装是否成功时,出现以下错误:

$ sudo docker run --gpus all --rm nvidia/cuda:10.1-base nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: nvml error: driver not loaded\\n\\"\"": unknown.

看起来关键错误是:

nvidia-container-cli: initialization error: nvml error: driver not loaded

我在本地没有 GPU,我发现有关是否需要在 NVIDIA 之前安装 CUDA 的信息相互矛盾 Docker。例如,this NVIDIA moderator says“正确的 nvidia docker 插件安装从在基础机器上正确安装 CUDA 开始。”

我的问题如下:

  1. 我可以在没有安装 CUDA 的情况下安装 NVIDIA Docker 吗?

  2. 如果是这样,这个错误的根源是什么,我该如何解决?

  3. 如果没有,我该如何创建此 Docker 图像以重现结果?

  1. Can I install NVIDIA Docker without having CUDA installed?

是的,你可以。 readme 指出 nvidia-docker 只需要安装 NVIDIA GPU 驱动程序和 Docker 引擎:

Note that you do not need to install the CUDA Toolkit on the host system, but the NVIDIA driver needs to be installed

  1. If so, what is the source of this error and how do I fix it?

那是因为你本地没有GPU或者不是NVIDIA,或者你在安装驱动的时候搞砸了。如果您有 CUDA-capable GPU,我建议您使用 NVIDIA guide 来安装驱动程序。如果你本地没有GPU,你仍然可以用CUDA构建镜像,然后你可以把它移动到有GPU的地方。

  1. If not, how do I create this Docker image to reproduce the results?

问题是,即使您设法摆脱了 Docker 映像中的 CUDA,也有软件需要它。在这种情况下,修复 Docker 文件在我看来是不必要的 - 你可以忽略 Docker 并开始将代码修复为 运行 它在 CPU.