ArrayFire：具有从主函数调用的 OpenCL 内核的函数

Question

函数如下（摘自http://arrayfire.org/docs/interop_opencl.htm）

独特的`main`功能

int main() {
    size_t length = 10;
    // Create ArrayFire array objects:
    af::array A = af::randu(length, f32);
    af::array B = af::constant(0, length, f32);
    // ... additional ArrayFire operations here
    // 2. Obtain the device, context, and queue used by ArrayFire
    static cl_context af_context = afcl::getContext();
    static cl_device_id af_device_id = afcl::getDeviceId();
    static cl_command_queue af_queue = afcl::getQueue();
    // 3. Obtain cl_mem references to af::array objects
    cl_mem * d_A = A.device<cl_mem>();
    cl_mem * d_B = B.device<cl_mem>();
    // 4. Load, build, and use your kernels.
    //    For the sake of readability, we have omitted error checking.
    int status = CL_SUCCESS;
    // A simple copy kernel, uses C++11 syntax for multi-line strings.
    const char * kernel_name = "copy_kernel";
    const char * source = R"(
        void __kernel
        copy_kernel(__global float * gA, __global float * gB)
        {
            int id = get_global_id(0);
            gB[id] = gA[id];
        }
    )";
    // Create the program, build the executable, and extract the entry point
    // for the kernel.
    cl_program program = clCreateProgramWithSource(af_context, 1, &source, NULL, &status);
    status = clBuildProgram(program, 1, &af_device_id, NULL, NULL, NULL);
    cl_kernel kernel = clCreateKernel(program, kernel_name, &status);
    // Set arguments and launch your kernels
    clSetKernelArg(kernel, 0, sizeof(cl_mem), d_A);
    clSetKernelArg(kernel, 1, sizeof(cl_mem), d_B);
    clEnqueueNDRangeKernel(af_queue, kernel, 1, NULL, &length, NULL, 0, NULL, NULL);
    // 5. Return control of af::array memory to ArrayFire
    A.unlock();
    B.unlock();
    // ... resume ArrayFire operations
    // Because the device pointers, d_x and d_y, were returned to ArrayFire's
    // control by the unlock function, there is no need to free them using
    // clReleaseMemObject()
    return 0;
}

效果很好，因为 B 的最终值与 A 的最终值一致，即 af_print(B);匹配A，但是当我分别写函数时如下：

分别`main`函数

arraycopy函数

void arraycopy(af::array A, af::array B,size_t length) {
    // 2. Obtain the device, context, and queue used by ArrayFire   
    static cl_context af_context = afcl::getContext();
    static cl_device_id af_device_id = afcl::getDeviceId();
    static cl_command_queue af_queue = afcl::getQueue();
    // 3. Obtain cl_mem references to af::array objects
    cl_mem * d_A = A.device<cl_mem>();
    cl_mem * d_B = B.device<cl_mem>();
    // 4. Load, build, and use your kernels.
    //    For the sake of readability, we have omitted error checking.
    int status = CL_SUCCESS;
    // A simple copy kernel, uses C++11 syntax for multi-line strings.
    const char * kernel_name = "copy_kernel";
    const char * source = R"(
        void __kernel
        copy_kernel(__global float * gA, __global float * gB)
        {
            int id = get_global_id(0);
            gB[id] = gA[id];
        }
    )";
    // Create the program, build the executable, and extract the entry point
    // for the kernel.
    cl_program program = clCreateProgramWithSource(af_context, 1, &source, NULL, &status);
    status = clBuildProgram(program, 1, &af_device_id, NULL, NULL, NULL);
    cl_kernel kernel = clCreateKernel(program, kernel_name, &status);
    // Set arguments and launch your kernels
    clSetKernelArg(kernel, 0, sizeof(cl_mem), d_A);
    clSetKernelArg(kernel, 1, sizeof(cl_mem), d_B);
    clEnqueueNDRangeKernel(af_queue, kernel, 1, NULL, &length, NULL, 0, NULL, NULL);
    // 5. Return control of af::array memory to ArrayFire
    A.unlock();
    B.unlock();
    // ... resume ArrayFire operations
    // Because the device pointers, d_x and d_y, were returned to ArrayFire's
    // control by the unlock function, there is no need to free them using
    // clReleaseMemObject()
}

main函数

int main()
{
    size_t length = 10;
    af::array A = af::randu(length, f32);
    af::array B = af::constant(0, length, f32);
    arraycopy(A, B, length);
    af_print(B);//does not match A
}

B的最终值没有改变，为什么会这样？我应该怎么做才能让它发挥作用？提前致谢

Answer 1

您按值而不是按引用将 af::array 传递给 arraycopy，因此无论您做什么，A 和 main 中的 B 都保持不变arraycopy 里面。您可以通过引用传递B：参数列表中的af::array &B。我还建议通过 const-reference 传递 A 作为习惯，以避免不必要的复制 (const af::array &A)。

Answer 2

您看到的行为背后的原因是引用计数。但这肯定不是错误，并且符合 C++ 语言行为。

af::array 使用赋值或等效操作创建的对象仅执行元数据的副本并保留共享指针。

在您的代码版本中，它是一个函数，B 通过值 传递，因此在内部 B 来自 arraycopy 函数是来自 main 函数的 B 的元数据副本，并共享指向 main 的数组 B 中的数据的指针。此时，如果用户调用 device 来获取指针，我们假设它是为了写入该指针的位置。因此，当在数组对象上调用 device 时，它有一个引用计数 > 1 的共享指针，我们复制原始数组（B from main）和 return 指针到那个记忆。因此，如果您在内部执行 af_print(B)，您将看到正确的值。这本质上是写时复制 - 由于 B 是按值传递的，因此您看不到 arraycopy 函数对 B 的修改结果。

在我说的第一行中，它符合 C++ 行为，因为如果对象 B 需要从函数修改，则必须通过引用传递。按值传递它只会使函数内部的值发生变化——这正是 ArrayFire 处理 af::array 个对象的方式。

希望消除困惑。

普拉迪普。 ArrayFire 开发团队。

ArrayFire：具有从主函数调用的 OpenCL 内核的函数

ArrayFire: function with an OpenCL kernel called from main function

c++

kernel

opencl

arrayfire

独特的`main`功能

分别`main`函数

ArrayFire：具有从主函数调用的 OpenCL 内核的函数

ArrayFire: function with an OpenCL kernel called from main function

c++

kernel

opencl

arrayfire

独特的main功能

分别main函数

独特的`main`功能

分别`main`函数