CUDA进程生命周期

CUDA process life time

看来我对 CUDA 的一些基本知识没有理解。我正在使用 C++ GUI 应用程序在双 GPU 卡上启动一些内核。当我启动主机进程时,nvidia-smi 没有列出任何进程。这是预料之中的,因为主机进程在使用 CUDA 并启动内核之前一直等到我单击按钮。如果我按下按钮,两个内核 运行 在两个 GPU 上都很好,退出并 return 预期结果。然后主机进程被 nvidia-smi 列出两次,每个 GPU 一次。在我退出主机进程之前,这两个进程在 nvidia-smi 中都是可见的。

我有点困惑,因为没有 cudaOpen()cudaClose() 函数(或类似的函数对)这样的东西。

哪个 CUDA API 调用导致进程被 nvidia-smi 列出?哪个 CUDA API 调用导致进程从列表中删除?

这在 CUDA 文档的第 3.2.1 节中进行了解释。 https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#initialization

3.2.1. Initialization

There is no explicit initialization function for the runtime; it initializes the first time a runtime function is called (more specifically any function other than functions from the error handling and version management sections of the reference manual). One needs to keep this in mind when timing runtime function calls and when interpreting the error code from the first call into the runtime.

The runtime creates a CUDA context for each device in the system (see Context for more details on CUDA contexts). This context is the primary context for this device and is initialized at the first runtime function which requires an active context on this device. It is shared among all the host threads of the application. As part of this context creation, the device code is just-in-time compiled if necessary (see Just-in-Time Compilation) and loaded into device memory. This all happens transparently. If needed, e.g. for driver API interoperability, the primary context of a device can be accessed from the driver API as described in Interoperability between Runtime and Driver APIs.

When a host thread calls cudaDeviceReset(), this destroys the primary context of the device the host thread currently operates on (i.e., the current device as defined in Device Selection). The next runtime function call made by any host thread that has this device as current will create a new primary context for this device. Note: The CUDA interfaces use global state that is initialized during host program initiation and destroyed during host program termination. The CUDA runtime and driver cannot detect if this state is invalid, so using any of these interfaces (implicitly or explicity) during program initiation or termination after main) will result in undefined behavior.