使用统一内存时 CUDA 中出现意外的读取访问冲突错误

Question

我有一个对象说 d_obj，它在统一内存中有一些成员，在设备内存中有一些成员。然后我调用一个 CUDA 内核，它获取对象并使用它。我想在内核调用后立即让 CPU 对统一内存上的成员做一些事情，但是失败了。在这里，我使用短代码重现了我的问题：

#include "cuda_runtime.h"
#include "device_launch_parameters.h"    
#include <stdio.h>

#define CHECK_CUDA(call)                                            \
{                                                                   \
const cudaError_t error = call;                                     \
if (error != cudaSuccess)                                           \
{                                                                   \
printf("ERROR:: File: %s, Line: %d, ", __FILE__, __LINE__);         \
printf("code: %d, reason: %s\n", error, cudaGetErrorString(error)); \
exit(EXIT_FAILURE);                                                 \
}                                                                   \
}

class MyClass
{
public:
    MyClass(int n_) : n(n_) { }
    void allocateMeOnDevice() {
        CHECK_CUDA(cudaMalloc((void**)&vec, n * sizeof(float)));
    }
    int n;
    float* vec;
};

__global__ void kernel(MyClass* obj) {
    for (int i = 0; i < obj->n; i++) {
        obj->vec[i] = 1;
    }
}

int main() {
    
    int n = 1000;

    MyClass h_obj(n); 

    MyClass* d_obj;
    CHECK_CUDA(cudaMallocManaged((void**)&d_obj, sizeof(MyClass)));
    CHECK_CUDA(cudaMemcpy(d_obj, &h_obj, sizeof(MyClass), cudaMemcpyHostToDevice));
    d_obj->allocateMeOnDevice();

    kernel << <1, 1 >> > (d_obj);

    //CHECK_CUDA(cudaDeviceSynchronize()); 
    printf("** d_obj->n is %d\n", d_obj->n); // <-- Read access violation if the above line is commented out    

}

难道不能同时从主机和设备访问统一内存上的东西吗？我想知道这个问题是否有任何解决方法？

OS: Windows 10/ CUDA 11.2/ 设备：GeForce RTX 3090

Answer 1

在 windows 和任何最新版本的 CUDA（例如 9.0 或更高版本）下，unified memory（或托管内存 - 同义词）行为表示为：

Applications running on Windows (whether in TCC or WDDM mode) will use the basic Unified Memory model as on pre-6.x architectures even when they are running on hardware with compute capability 6.x or higher.

稍后，the documentation 表示对于此类系统，有必要在内核启动后发出 cudaDeviceSynchronize()，然后 CPU 才能再次访问托管数据.

如果您在 windows 上未能做到这一点，您将在尝试访问任何托管数据的 CPU 代码中遇到段错误。

一些可能的解决方法：

切换到 Linux（假设您的 GPU 为 cc6.x 或更高）
使用主机固定 ("zero-copy") 内存，而不是托管内存。但是，对于批量或大规模数据访问，这可能会对性能产生影响。

使用统一内存时 CUDA 中出现意外的读取访问冲突错误

Unexpected read access violation error in CUDA when working with unified memory

cuda

unified-memory