如何从 Docker 容器连接到 Nvidia MPS 服务器？

Question

我想让许多 docker 容器重叠使用 GPU。 Nvidia 提供了一个实用程序来执行此操作：记录在案的多进程服务 here。具体说：

When CUDA is first initialized in a program, the CUDA driver attempts to connect to the MPS control daemon. If the connection attempt fails, the program continues to run as it normally would without MPS. If however, the connection attempt succeeds, the MPS control daemon proceeds to ensure that an MPS server, launched with same user id as that of the connecting client, is active before returning to the client. The MPS client then proceeds to connect to the server. All communication between the MPS client, the MPS control daemon, and the MPS server is done using named pipes.

默认情况下，命名管道位于 /tmp/nvidia-mps/，因此我使用卷与容器共享该目录。

但这对于容器上的 cuda 驱动程序到 "see" MPS 服务器是不够的。

我应该在主机和容器之间共享哪些资源以便它可以连接到 MPS 服务器？

Answer 1

要启动一个可以访问 mps 的容器，它必须绑定挂载到 /tmp/nvidia-mps 并且 interprocess-communication group 与主机相同。

例如：

docker run -v /tmp/nvidia-mps:/tmp/nvidia-mps --ipc=host nvidia/cuda

Answer 2

我认为不需要将 /tmp/nvidia-mps 映射到容器中。只要 IPC 命名空间相同，它就应该可以工作。

如果您运行在主机上使用 MPS 控制守护程序，那么您需要使用 docker 运行标志 --ipc=host，如前所述，因为MPS 将在主机上使用 /dev/shm（这是 IPC 命名空间映射到的位置）。使用 --ipc=host 标志将告诉 docker 将主机的 /dev/shm 映射到容器中，而不是在容器内创建私有 /dev/shm。

例如

docker run --ipc=host nvidia/cuda

请注意，也可以在容器内托管 MPS 并在容器之间共享该容器的 IPC 命名空间 (/dev/shm)。

如何从 Docker 容器连接到 Nvidia MPS 服务器？

How to connect to Nvidia MPS server from a Docker container?

nvidia

docker

nvidia-docker