在 python 中,如何将标量参数传递给 OpenCL 内核?
In python how do I pass a scalar argument to an OpenCL kernel?
我正在使用 OpenCL 的 python 绑定,我编写了一个需要标量(浮点)参数的内核,但我想不出传递它的正确方法。
如果我只是调用
prg.bloop(queue, [width,height], None, centers_g, 16/9, result_g)
我收到这个错误:
pyopencl.cffi_cl.LogicError: when processing argument #2 (1-based): 'float' does not support the buffer interface
如果我用 numpy.float32(16/9)
包装它,那么内核的行为就好像它被传递了 0 而不是 1.7777777 。
这里有更多的源代码,可以帮助您理解我在做什么。
def mission2(cells, width, height):
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
centers = numpy.array(list(cells), dtype=numpy.float32)
centers_g = cl.Buffer(ctx, cl.mem_flags.READ_ONLY|cl.mem_flags.COPY_HOST_PTR, hostbuf = centers)
prg = cl.Program(ctx, """
__kernel void test1(__global char *out) {
out[0] = 77;
}
float sphere(float r, float x, float y)
{
float q = r*r-(x*x+y*y);
if (q<0)
return 0;
return sqrt(q);
}
__kernel void bloop(__global const float centers[][6], float aspect, __global char * out)
{
int u = get_global_id(0);
int v = get_global_id(1);
int width = get_global_size(0);
int height = get_global_size(1);
float x = u/(float)width * aspect;
float y = v/(float)height;
float max = sphere(0.3, x-centers[0][0], y-centers[0][1]);
int idx = u+v*width;
out[idx] = 255*max;
}
""").build()
result_g = cl.Buffer(ctx, cl.mem_flags.WRITE_ONLY, width*height)
if False:
prg.test1(queue, [width,height], None, result_g)
else:
prg.bloop(queue, [width,height], None, centers_g, numpy.float32(16/9), result_g)
result = numpy.zeros([height,width], dtype=numpy.uint8)
future = cl.enqueue_copy(queue, result, result_g)
future.wait()
print(result)
imsave("/tmp/bloop.png", result, cmap = matplotlib.pyplot.get_cmap('gray'))
为了将标量传递给 OpenCL 内核,您可以像这样使用 set_scalar_arg_dtypes 函数:
kernel = prg.bloop
kernel.set_scalar_arg_dtypes( [None, numpy.float32, None] )
kernel(queue, [width, height], None, centers_g, aspect, result_g)
The manual page 明确指出,如果您这样编码,它将不起作用:
prg.bloop.set_scalar_arg_dtypes( [None, numpy.float32, None] )
# this will fail:
prg.bloop(queue, [width, height], None, centers_g, 16/9, result_g)
因为"The information set by this rountine is attached to a single kernel instance. A new kernel instance is created every time you use program.kernel attribute access."
我正在使用 OpenCL 的 python 绑定,我编写了一个需要标量(浮点)参数的内核,但我想不出传递它的正确方法。
如果我只是调用
prg.bloop(queue, [width,height], None, centers_g, 16/9, result_g)
我收到这个错误:
pyopencl.cffi_cl.LogicError: when processing argument #2 (1-based): 'float' does not support the buffer interface
如果我用 numpy.float32(16/9)
包装它,那么内核的行为就好像它被传递了 0 而不是 1.7777777 。
这里有更多的源代码,可以帮助您理解我在做什么。
def mission2(cells, width, height):
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
centers = numpy.array(list(cells), dtype=numpy.float32)
centers_g = cl.Buffer(ctx, cl.mem_flags.READ_ONLY|cl.mem_flags.COPY_HOST_PTR, hostbuf = centers)
prg = cl.Program(ctx, """
__kernel void test1(__global char *out) {
out[0] = 77;
}
float sphere(float r, float x, float y)
{
float q = r*r-(x*x+y*y);
if (q<0)
return 0;
return sqrt(q);
}
__kernel void bloop(__global const float centers[][6], float aspect, __global char * out)
{
int u = get_global_id(0);
int v = get_global_id(1);
int width = get_global_size(0);
int height = get_global_size(1);
float x = u/(float)width * aspect;
float y = v/(float)height;
float max = sphere(0.3, x-centers[0][0], y-centers[0][1]);
int idx = u+v*width;
out[idx] = 255*max;
}
""").build()
result_g = cl.Buffer(ctx, cl.mem_flags.WRITE_ONLY, width*height)
if False:
prg.test1(queue, [width,height], None, result_g)
else:
prg.bloop(queue, [width,height], None, centers_g, numpy.float32(16/9), result_g)
result = numpy.zeros([height,width], dtype=numpy.uint8)
future = cl.enqueue_copy(queue, result, result_g)
future.wait()
print(result)
imsave("/tmp/bloop.png", result, cmap = matplotlib.pyplot.get_cmap('gray'))
为了将标量传递给 OpenCL 内核,您可以像这样使用 set_scalar_arg_dtypes 函数:
kernel = prg.bloop
kernel.set_scalar_arg_dtypes( [None, numpy.float32, None] )
kernel(queue, [width, height], None, centers_g, aspect, result_g)
The manual page 明确指出,如果您这样编码,它将不起作用:
prg.bloop.set_scalar_arg_dtypes( [None, numpy.float32, None] )
# this will fail:
prg.bloop(queue, [width, height], None, centers_g, 16/9, result_g)
因为"The information set by this rountine is attached to a single kernel instance. A new kernel instance is created every time you use program.kernel attribute access."