如何将 int 向量传递给 CUDA 全局函数

Question

我正在编写我的第一个 CUDA 程序并遇到了很多问题，因为我的主要编程语言不是 C++。

在我的控制台应用程序中，我有一个 vector of int，它包含一个固定的数字列表。我的代码应该创建新向量并检查与原始常量向量的匹配。

我不知道如何将矢量指针传递/复制到 GPU 设备中。在尝试将我的代码从 C# 转换为 C++ 并使用内核后，我收到此错误消息：

“从 global 函数（“MagicSeedCUDA::bigCUDAJob”）是不允许的

这是我的代码的一部分：

std::vector<int> selectedList;
FillA1(A1, "0152793281263155465283127699107744880041");
selectedList = A1;
bigCUDAJob<< <640, 640, 640>> >(i, j, selectedList);

__global__ void bigCUDAJob(int i, int j, std::vector<int> selectedList)
    {    
        std::vector<int> tempList;
        // here comes code that adds numbers to tempList
        // code to find matches between tempList and the 
        // parameter selectedList 
    }

如何修改我的代码以免出现编译器错误？我也可以使用 int 数组。

Answer 1

I don't know how to pass / copy pointers of a vector into the GPU device

首先，提醒自己如何将不在 std::vector 中的内存传递给 CUDA 内核。（重新）阅读 vectorAdd example program，NVIDIA 的 CUDA 样本的一部分。

cudaError_t status;
std::vector<int> selectedList;

// ... etc. ...

int *selectedListOnDevice = NULL;
std::size_t selectedListSizeInBytes = sizeof(int) * selectedList.size();
status = cudaMalloc((void **)&selectedListOnDevice, selectedListSizeInBytes);
if (status != cudaSuccess) { /* handle error */ }
cudaMemcpy(selectedListOnDevice, selectedList.data(), selectedListSizeInBytes);
if (status != cudaSuccess) { /* handle error */ }

// ... etc. ...

// eventually:
cudaFree(selectedListOnDevice);

那是使用官方 CUDA 运行时 API。但是，如果您使用 my CUDA API wrappers（您绝对不需要），则上面的内容变为：

auto selectedListOnDevice = cuda::memory::make_unique<int[]>(selectedList.size());
cuda::memory::copy(selectedListOnDevice.get(), selectedList.data());

而且您不需要自己处理错误 - 出错时，将抛出 exception。

另一种选择是使用 NVIDIA's thrust library，它提供了一个类似于 std::vector 的 class，称为“设备向量”。这允许你写：

thrust::device_vector<int> selectedListOnDevice = selectedList;

它应该“正常工作”。

I get this error message:

Error calling a host function("std::vector<int, ::std::allocator >
::vector()") from a global function("MagicSeedCUDA::bigCUDAJob") is
not allowed

如@paleonix 所述，Using std::vector in CUDA device code 中涵盖了该问题。简而言之：无论您如何尝试和编写它，您根本无法让 std::vector 出现在您的 __device__ 或 __global__ 函数中。

I'm writing my first CUDA program and encounter a lot of issues, as my main programming language is not C++.

然后，不管 std::vector 的具体问题如何，您都应该花一些时间学习 C++ 编程。或者，您可以重温 C 编程，因为您可以编写 C'ish 而不是 C++'ish 的 CUDA 内核；但 C++ 的特性在编写内核时实际上非常有用，而不仅仅是在 host-side.

如何将 int 向量传递给 CUDA 全局函数

How to Pass Vector of int into CUDA global function

c++

cuda