无法理解 jCuda cuLaunchKernel 是如何工作的？

Question

我正在尝试了解如何在 Java 中使用 Cuda。我正在使用 jCuda。

一切都很好，直到我遇到一个包含代码的示例：

    // Set up the kernel parameters: A pointer to an array
    // of pointers which point to the actual values.
    Pointer kernelParameters = Pointer.to(
        Pointer.to(new int[]{numElements}),
        Pointer.to(deviceInputA),
        Pointer.to(deviceInputB),
        Pointer.to(deviceOutput)
    );

核函数原型为：

__global__ void add(int n, float *a, float *b, float *sum)

问题是： 在c方面，我们是不是好像在传递类似的东西？

(***n, ***a, ***b, ***sum)

所以基本上，我们是否总是必须：

Pointer kernelParameters = Pointer.to( double pointer, double pointer, ...)???

谢谢

Answer 1

cuLaunchKernel function of JCuda corresponds to the cuLaunchKernel function of CUDA。这个函数在CUDA中的签名是

CUresult cuLaunchKernel(
    CUfunction f, 
    unsigned int gridDimX, 
    unsigned int gridDimY, 
    unsigned int gridDimZ, 
    unsigned int blockDimX, 
    unsigned int blockDimY, 
    unsigned int blockDimZ, 
    unsigned int sharedMemBytes, 
    CUstream hStream, 
    void** kernelParams, 
    void** extra)

其中 kernelParams 是与此问题相关的唯一参数。文档说

Kernel parameters can be specified via kernelParams. If f has N parameters, then kernelParams needs to be an array of N pointers. Each of kernelParams[0] through kernelParams[N-1] must point to a region of memory from which the actual kernel parameter will be copied.

这里的重点是最后一句话：kernelParams数组的元素不是实际的内核参数。它们只指向实际的内核参数。

事实上，对于接收单个 float *pointer 的内核，这具有奇怪的效果，您基本上可以按如下方式设置内核参数：

float *pointer= allocateSomeDeviceMemory();
float** pointerToPointer = &pointer;
float*** pointerToPointerToPointer = &pointerToPointer;
void **kernelParams = pointerToPointerToPointer;

_{（这只是为了说明这确实是一个指向指针的指针——实际上，你不会那样写）}

现在JCuda和CUDA的内核参数"structure"基本一致了。当然你可以不在Java中取"the address of a pointer"，但是间接的次数是一样的。想象一下你有一个这样的内核：

__global__ void example(int value, float *pointer)

在CUDA C API中，你可以定义内核参数如下：

int value = 123;
float *pointer= allocateSomeDeviceMemory();

int* pointerToValue = &value;
float** pointerToPointer = &pointer;

void **kernelParams = {
    pointerToValue,
    pointerToPointer
};

设置在 JCuda 中类似地完成 Java API:

int value = 123;
Pointer pointer= allocateSomeDeviceMemory();

Pointer pointerToValue = Pointer.to(new int[]{value});
float** pointerToPointer = Pointer.to(pointer);

Pointer kernelParameters = Pointer.to(
    pointerToValue,
    pointerToPointer
);

此处相关的主要区别在于，您可以使用地址运算符 &:

在 C 中更简洁地编写此代码

void **kernelParams = {
    &value,             // This can be imagined as a pointer to an int
    &pointer            // This can be imagined as a pointer to a pointer
};

但这与您提供的示例基本相同：

Pointer kernelParameters = Pointer.to(
    Pointer.to(new int[]{value}),   // A pointer to an int
    Pointer.to(pointer)             // A pointer to a pointer
);

同样，关键点在于像

这样的东西

void **kernelParams = {
    &value,
};

或

Pointer kernelParameters = Pointer.to(
    Pointer.to(new int[]{value}),
);

您没有将 value 直接传递给内核。相反，您是在告诉 CUDA："Here is an array of pointers. The first pointer points to an int value. Copy the value from this memory location, and use it as the actual value for the kernel call"。

无法理解 jCuda cuLaunchKernel 是如何工作的？

Cannot understand how jCuda cuLaunchKernel work?

jcuda