如何减少 CUDA 上下文大小（多进程服务）

How to reduce CUDA context size (Multi-Process Service)

我关注了 Robert Crovella 的 on how to use Nvidia's Multi-Process Service. According to docs:

2.1.2. Reduced on-GPU context storage

Without MPS each CUDA processes using a GPU allocates separate storage and scheduling resources on the GPU. In contrast, the MPS server allocates one copy of GPU storage and scheduling resources shared by all its clients.

我理解为每个进程上下文大小的减少，这是可能的，因为它们是共享的。这将增加空闲 GPU 内存，从而使运行更多并行进程。

现在，回到这个例子。没有 MPS：

并且使用 MPS：

不幸的是，每个进程仍然占用几乎相同 (~300MB) 的内存量。这与文档不矛盾吗？有没有办法减少每个进程的内存消耗？

哎呀，在检查另一张（Volta 之前的）卡上的内存使用情况之前，我急切地询问了，是的，实际上是有区别的。如果其他人也偶然发现了这个问题，让我在这里 post 以供将来参考：

关闭 MPS：

MPS 开启：

的确，正如所见here，在 Volta 架构中，您可以看到进程直接与 GPU 通信，中间没有 MPS 服务器：

Volta MPS clients submit work directly to the GPU without passing through the MPS server.

这可以从您的第一个屏幕截图中轻松看出，其中 t1034 进程被列为使用 GPU。

相反，在 Volta 之前的架构中，客户端进程通过 MPS 服务器与 GPU 通信。这导致在后面的屏幕截图中只能看到 MPS 服务器进程直接与 GPU 通信。

如何减少 CUDA 上下文大小（多进程服务）

How to reduce CUDA context size (Multi-Process Service)

cuda

gpu

gpgpu

multi-process-service

cuda-context