cudaMemGetInfo占用的空闲内存

Question

我有以下简单的代码来查找可用的 GPU

int * getFreeGpuList(int *numFree) {
    int * gpuList;
    int nDevices;
    int i, j = 0, count = 0;

    cudaGetDeviceCount(&nDevices);
    gpuList = (int *) malloc(nDevices * sizeof(int));
    for (i = 0; i < nDevices; ++i) {
        cudaSetDevice(i);
        size_t freeMem;
        size_t totalMem;
        cudaMemGetInfo(&freeMem, &totalMem);
        if (freeMem > .9 * totalMem) {
            gpuList[j] = i;
            count++;
            j++;
        }
    }
    *numFree = count;
    return gpuList;
}

问题是 cudaMemGetInfo 在每个 GPU 中占用了一些内存（在我的情况下为 ~150MB）。这段代码是运行很久的一个大程序的一部分，我经常同时运行几个进程，所以最后这个函数占用的内存是很大的。你能告诉我如何释放 cudaMemGetInfo 占用的 GPU 内存吗？谢谢！

Answer 1

根据 talonmies above that cudaSetDevice creates a context and occupies some memory in the device, I found out that cudaDeviceReset 的一些见解，可以“显式销毁和清理当前进程中与当前设备关联的所有资源”，而不会影响同一设备上的其他进程。

11 月 26 日更新：如果要查询 GPU 信息，最好使用 NVML 库。根据我的经验，它更快并且不会占用内存用于简单的内存和名称查询。

cudaMemGetInfo占用的空闲内存

Free memory occupied by cudaMemGetInfo

cuda

gpu