初学者 OpenCL 代码中的内存分配问题

Issue with memory allocation in beginner OpenCL code

我正在尝试 运行 使用 Intel CPU 和集成 Iris 显卡进行初学者级别的 OpenCL 测试。我正在使用标准 g++ 和 -framework OpenCL 作为编译开关来编译代码。我已经尝试通过 运行 gdb 清理代码并参考了一些在线指南。但是,我仍然看到一个错误,我假设这与内存分配有关。我在下面粘贴了我的全部代码;如果您发现任何明显错误,请提供帮助。

对冗长的评论表示歉意。如果我在那里也有一些错误的假设,请告诉我:)

#include <iostream>
#include <OpenCL/opencl.h>
#include <cassert>
    
// the kernel that we want to execute on the device.
// here, you are doing an addition of elements in an array.
const char* kernelAdd =
{
    "__kernel void add (global int* data)\n"
    "{\n" 
    "   int work_item_id = get_global_id(0);\n"
    "   data[work_item_id] *= 2;\n"
    "}\n"
};

int main (int argc, char* argv[]) 
{
    cl_int ret_val;

    // getting the platform ID that can used - here we are getting only one
    cl_platform_id platformID;
    cl_uint numPlatforms;
    if((clGetPlatformIDs(1, &platformID, &numPlatforms)))
        std::cout << "clGetPlatformIDs failed!" << std::endl;

    // getting OpenCL device ID for our GPU - here too, we are getting only one
    cl_device_id deviceID;
    cl_uint numDevices;
    if((clGetDeviceIDs(platformID, CL_DEVICE_TYPE_GPU, 1, &deviceID, &numDevices)))
        std::cout << "clGetDeviceIDs failed!" << std::endl;

    // printing out some device info. here we have chosen CL_DEVICE_NAME.
    // you can choose any others by referring
    // https://www.khronos.org/registry/OpenCL/sdk/1.0/docs/man/xhtml/clGetDeviceInfo.html
    typedef char typeInfo;
    size_t sizeInfo = 16*sizeof(typeInfo);
    typeInfo* deviceInfo = new typeInfo(sizeInfo); 
    if((clGetDeviceInfo(deviceID, CL_DEVICE_NAME, sizeInfo, (void*) deviceInfo, NULL)))
        std::cout << "clGetDeviceInfo failed!" << std::endl;

    std::cout << "CL_DEVICE_NAME = " << deviceInfo << ", platform ID = ";
    std::cout << platformID << ", deviceID = " << deviceID << std::endl;
    
    // set up a context for our device
    cl_context_properties contextProp[3] = {CL_CONTEXT_PLATFORM, (cl_context_properties) platformID, 0};
    cl_context context = clCreateContext(contextProp, 1, &deviceID, NULL, NULL, &ret_val);
    if (ret_val)
        std::cout << "clCreateContext failed!" << std::endl;

    // set up a queue for our device
    cl_command_queue queue = clCreateCommandQueue(context, deviceID, (cl_command_queue_properties) NULL, &ret_val);
    if (ret_val)
        std::cout << "clCreateCommandQueue failed!" << std::endl;

    // creating our data set that we want to compute on
    int N = 1 << 4;
    size_t data_size = sizeof(int) * N;
    int* input_data = new int(N);
    int* output_data = new int(N);

    for (int i = 0; i < data_size; i++)
    {
        input_data[i] = rand() % 1000;
    }

    // create a buffer to where you will eventually enqueue the program for the device
    cl_mem buffer = clCreateBuffer(context, CL_MEM_READ_WRITE, data_size, NULL, &ret_val);
    if (ret_val)
        std::cout << "clCreateBuffer failed!" << std::endl;

    // copying our data set to the buffer
    if((clEnqueueWriteBuffer(queue, buffer, CL_TRUE, 0, data_size, input_data, 0, NULL, NULL)))
        std::cout << "clEnqueueWriteBuffer failed!" << std::endl;

    // we compile the device program with our source above and create a kernel for it.
    // also, we are allowed to create a device program with a binary that we can point to.
    cl_program program = clCreateProgramWithSource(context, 1, (const char**) &kernelAdd, NULL, &ret_val); 
    if (ret_val)
        std::cout << "clCreateProgramWithSource failed!" << std::endl; 

    if((clBuildProgram(program, 1, &deviceID, NULL, NULL, NULL)))
        std::cout << "clBuildProgram failed!" << std::endl; 

    cl_kernel kernel = clCreateKernel(program, "add", &ret_val);
    if (ret_val)
        std::cout << "clCreateKernel failed! ret_val = " << ret_val << std::endl;

    // configure options to find the arguments to the kernel
    if((clSetKernelArg(kernel, 0, sizeof(buffer), &buffer)))
        std::cout << "clSetKernelArg failed!" << std::endl;

    // the total number of work items that we want to use
    const size_t global_dimensions[3] = {data_size, 0, 0};
    if((clEnqueueNDRangeKernel(queue, kernel, 1, NULL, global_dimensions, NULL, 0, NULL, NULL)))
        std::cout << "clEnqueueNDRangeKernel failed!" << std::endl;

    // read back output into another buffer
    ret_val = clEnqueueReadBuffer(queue, buffer, CL_TRUE, 0, data_size, output_data, 0, NULL, NULL);
    if(ret_val)
        std::cout << "clEnqueueReadBuffer failed! ret_val = " << ret_val << std::endl;

    std::cout << "Kernel completed" << std::endl;

    // Release kernel, program, and memory objects
    if(clReleaseMemObject(buffer))
        std::cout << "clReleaseMemObject failed!" << std::endl;

    if(clReleaseKernel(kernel))
        std::cout << "clReleaseKernel failed!" << std::endl;

    if(clReleaseProgram(program))
        std::cout << "clReleaseProgram failed!" << std::endl;

    if(clReleaseCommandQueue(queue))
        std::cout << "clReleaseCommandQueue failed!" << std::endl;

    if(clReleaseContext(context))
        std::cout << "clReleaseContext failed!" << std::endl; 

    for (int i = 0; i < data_size; i++)
    {
        assert(output_data[i] == input_data[i]/2);
    }

    return 0;
}

输出结果如下:

CL_DEVICE_NAME = Iris, platform ID = 0x7fff0000, deviceID = 0x1024500
objc[1034]: Method cache corrupted. This may be a message to an invalid object, or a memory error somewhere else.
objc[1034]: receiver 0x7fefb8712a90, SEL 0x7fff7ce87c58, isa 0x7fff99268208, cache 0x7fff99268218, buckets 0x7fefb87043c0, mask 0x3, occupied 0x1
objc[1034]: receiver 48 bytes, buckets 64 bytes
objc[1034]: selector 'dealloc'
objc[1034]: isa 'OS_xpc_array'
objc[1034]: Method cache corrupted. This may be a message to an invalid object, or a memory error somewhere else.
make: *** [all] Abort trap: 6

很常见的错误

int* input_data = new int(N);

应该是

int* input_data = new int[N];

您的版本分配了一个 int 并将其初始化为 N。要分配 N 个整数,您需要方括号。