NVIDIA-MPS中如何控制每个client的资源

Question

在 nvidia-mps 中，我们通过运行ning sudo nvidia-cuda-mps-control -d 启动 mps-server，我有两个问题。

当我在同一台服务器上有多个 GPU 时，如何指定哪个 GPU 到运行 mps-server。
当我有多个并发进程时，如何控制为每个mps客户端分配的资源（如计算和内存）？

Answer 1

CUDA MPS 文档将回答许多这样的问题。

How to specify which GPU to run mps-server when I have multiple GPUs on the same server.

从 CUDA MPS doc, section 2.3.4, the GPUs that are visible（通过 CUDA_VISIBLE_DEVICES）启动 MPS 服务器时，将确定它将使用哪些 GPU：

2.3.4. MPS on Multi-GPU Systems
The MPS server supports using multiple GPUs. On systems with more than one GPU,
you can use CUDA_VISIBLE_DEVICES to enumerate the GPUs you would like to use.
See section 4.2 for more details.

How to control the resources (such as computation and memory) allocated for each mps client when I have multiple concurrent processes?

来自同一文档的第 2.3.5.2 节，每个进程的计算分配的主要方法是通过设置环境变量 CUDA_MPS_ACTIVE_THREAD_PERCENTAGE。当进程开始并初始化 CUDA 运行时或驱动程序时此环境变量的设置 API 将确定其可用计算资源 (SM)，以百分比表示。如果您有多个 GPU，它将是您的应用程序使用 cudaSetDevice() 或类似方式选择的 GPU 上 SM 资源的百分比。

~~MPS 目前不提供每个进程内存的机制 allocation/partitioning。~~

编辑：更新 - CUDA 11.5 于 2021 年 10 月 20 日公开发布，添加了 new feature 允许 MPS 中的每个客户端内存限制。

NVIDIA-MPS中如何控制每个client的资源

How to control the resource of each client in NVIDIA-MPS

cuda

gpu