pyopencl global_work_offset 内核参数
pyopencl global_work_offset kernel argument
我想使用来自 OpenCL API 函数 clEnqueueNDRangeKernel. I can't figure out how to do that within pyopencl API 的 global_work_offset
参数。这是一个演示代码,我想在其中向内核调用添加偏移量 2,因此 get_global_id(0) 从 2 而不是 0 开始:
import pyopencl as cl
import pyopencl.array
import numpy as np
platform = cl.get_platforms()[0]
devices = platform.get_devices()[1] #gpu
context = cl.Context(devices=[devices])
queue = cl.CommandQueue(context)
kernel = cl.Program(context, """
__kernel void derp(global char* a) {
a[get_global_id(0)] = 1;
}""").build()
buffarr = cl.array.zeros(queue, 4, dtype=np.uint8)
kernel.derp(queue, (2,), None, buffarr.data)
np_data = buffarr.get()
# within this demo the buffer contains currently [1,1,0,0]
assert np.array_equal(np_data, [0,0,1,1])
如何更改代码以使断言不会失败?我不想在此处向内核代码添加额外的参数。
作为 documentation,您可以将 global_offset
作为命名参数传递。
内核的调用变为:
kernel.derp(queue, (4, 1), None, buffarr.data, global_offset=[2, 0])
更改的程序:
import pyopencl as cl
import pyopencl.array
import numpy as np
platform = cl.get_platforms()[2]
print(platform)
devices = platform.get_devices()[0] #gpu
context = cl.Context(devices=[devices])
queue = cl.CommandQueue(context)
kernel = cl.Program(context, """
__kernel void derp(global char* a) {
a[get_global_id(0)] = 1;
}""").build()
buffarr = cl.array.zeros(queue, 4, dtype=np.uint8)
# (4, 1) ==> shape of the buffer
kernel.derp(queue, (4, 1), None, buffarr.data, global_offset=[2, 0])
np_data = buffarr.get()
print(np_data)
# within this demo the buffer contains currently [1,1,0,0]
assert np.array_equal(np_data, [0,0,1,1])
print("Ok")
执行后:
在设备 0 上
<pyopencl.Platform 'Intel(R) OpenCL' at 0x60bdc0>
[0 0 1 1]
Ok
在设备 1 上
<pyopencl.Platform 'Experimental OpenCL 2.0 CPU Only Platform' at 0xb60a20>
[0 0 1 1]
Ok
在设备 2 上
<pyopencl.Platform 'NVIDIA CUDA' at 0xff0440>
[0 0 1 1]
Ok
测试 python 2.7.11 [MSC v.1500 64 位 (AMD64)] - pyopencl (2015, 1)
我想使用来自 OpenCL API 函数 clEnqueueNDRangeKernel. I can't figure out how to do that within pyopencl API 的 global_work_offset
参数。这是一个演示代码,我想在其中向内核调用添加偏移量 2,因此 get_global_id(0) 从 2 而不是 0 开始:
import pyopencl as cl
import pyopencl.array
import numpy as np
platform = cl.get_platforms()[0]
devices = platform.get_devices()[1] #gpu
context = cl.Context(devices=[devices])
queue = cl.CommandQueue(context)
kernel = cl.Program(context, """
__kernel void derp(global char* a) {
a[get_global_id(0)] = 1;
}""").build()
buffarr = cl.array.zeros(queue, 4, dtype=np.uint8)
kernel.derp(queue, (2,), None, buffarr.data)
np_data = buffarr.get()
# within this demo the buffer contains currently [1,1,0,0]
assert np.array_equal(np_data, [0,0,1,1])
如何更改代码以使断言不会失败?我不想在此处向内核代码添加额外的参数。
作为 documentation,您可以将 global_offset
作为命名参数传递。
内核的调用变为:
kernel.derp(queue, (4, 1), None, buffarr.data, global_offset=[2, 0])
更改的程序:
import pyopencl as cl
import pyopencl.array
import numpy as np
platform = cl.get_platforms()[2]
print(platform)
devices = platform.get_devices()[0] #gpu
context = cl.Context(devices=[devices])
queue = cl.CommandQueue(context)
kernel = cl.Program(context, """
__kernel void derp(global char* a) {
a[get_global_id(0)] = 1;
}""").build()
buffarr = cl.array.zeros(queue, 4, dtype=np.uint8)
# (4, 1) ==> shape of the buffer
kernel.derp(queue, (4, 1), None, buffarr.data, global_offset=[2, 0])
np_data = buffarr.get()
print(np_data)
# within this demo the buffer contains currently [1,1,0,0]
assert np.array_equal(np_data, [0,0,1,1])
print("Ok")
执行后:
在设备 0 上
<pyopencl.Platform 'Intel(R) OpenCL' at 0x60bdc0>
[0 0 1 1]
Ok
在设备 1 上
<pyopencl.Platform 'Experimental OpenCL 2.0 CPU Only Platform' at 0xb60a20>
[0 0 1 1]
Ok
在设备 2 上
<pyopencl.Platform 'NVIDIA CUDA' at 0xff0440>
[0 0 1 1]
Ok
测试 python 2.7.11 [MSC v.1500 64 位 (AMD64)] - pyopencl (2015, 1)