PyOpenCL 程序没有 return 预期的输出
PyOpenCL program does not return expected output
我刚刚开始通过 PyOpenCL 学习 OpenCL,我一直在学习一些教程。我正在编写脚本 here。程序执行没有任何错误,但数组的总和不正确。这是确切的代码:
# Use OpenCL To Add Two Random Arrays (This Way Shows Details)
import pyopencl as cl # Import the OpenCL GPU computing API
import numpy as np # Import Np number tools
platform = cl.get_platforms()[0] # Select the first platform [0]
for device in platform.get_devices():
print device
device = platform.get_devices()[2] # Select the first device on this platform [0]
context = cl.Context([device]) # Create a context with your device
queue = cl.CommandQueue(context) # Create a command queue with your context
np_a = np.random.rand(5).astype(np.float32) # Create a random np array
np_b = np.random.rand(5).astype(np.float32) # Create a random np array
np_c = np.empty_like(np_a) # Create an empty destination array
cl_a = cl.Buffer(context, cl.mem_flags.COPY_HOST_PTR, hostbuf=np_a)
cl_b = cl.Buffer(context, cl.mem_flags.COPY_HOST_PTR, hostbuf=np_b)
cl_c = cl.Buffer(context, cl.mem_flags.WRITE_ONLY, np_c.nbytes)
# Create three buffers (plans for areas of memory on the device)
kernel = \
"""
__kernel void sum(__global float* a, __global float* b, __global float* c)
{
int i = get_global_id(0);
c[i] = a[i] + b[i];
}
""" # Create a kernel (a string containing C-like OpenCL device code)
program = cl.Program(context, kernel).build()
# Compile the kernel code into an executable OpenCL program
program.sum(queue, np_a.shape, None, cl_a, cl_b, cl_c)
# Enqueue the program for execution, causing data to be copied to the device
# - queue: the command queue the program will be sent to
# - np_a.shape: a tuple of the arrays' dimensions
# - cl_a, cl_b, cl_c: the memory spaces this program deals with
queue.finish()
np_arrays = [np_a, np_b, np_c]
cl_arrays = [cl_a, cl_b, cl_c]
for x in range(3):
cl.enqueue_copy(queue, cl_arrays[x], np_arrays[x])
# Copy the data for array c back to the host
arrd = {"a":np_a, "b":np_b, "c":np_c}
for k in arrd:
print k + ": ", arrd[k]
# Print all three host arrays, to show sum() worked
并且输出:
<pyopencl.Device 'Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz' on 'Apple' at 0xffffffff>
<pyopencl.Device 'Iris Pro' on 'Apple' at 0x1024500>
<pyopencl.Device 'AMD Radeon R9 M370X Compute Engine' on 'Apple' at 0x1021c00>
a: [ 0.44930401 0.77514887 0.28574091 0.24021916 0.3193087 ]
c: [ 0.0583559 0.85157514 0.80443901 0.09400933 0.87276274]
b: [ 0.81869799 0.49566364 0.85423696 0.68896079 0.95608395]
我对这里发生的事情的猜测是数据正在主机和设备之间正确复制,但内核没有被执行。据我从本教程和其他教程中了解到,代码应该足以执行内核。启动内核是否需要其他调用?我不确定这个例子使用的是哪个版本的 PyOpenCL,但我是 运行 2016.2
from conda-forge
on a Mac OS X.任何帮助非常感激。
您使用错误的参数顺序调用了 enqueue_copy。
你应该这样称呼它:
cl.enqueue_copy(queue, np_arrays[x], cl_arrays[x])
另一方面,您不需要复制回输入数组,因为您已经在主机上创建了它们。
我刚刚开始通过 PyOpenCL 学习 OpenCL,我一直在学习一些教程。我正在编写脚本 here。程序执行没有任何错误,但数组的总和不正确。这是确切的代码:
# Use OpenCL To Add Two Random Arrays (This Way Shows Details)
import pyopencl as cl # Import the OpenCL GPU computing API
import numpy as np # Import Np number tools
platform = cl.get_platforms()[0] # Select the first platform [0]
for device in platform.get_devices():
print device
device = platform.get_devices()[2] # Select the first device on this platform [0]
context = cl.Context([device]) # Create a context with your device
queue = cl.CommandQueue(context) # Create a command queue with your context
np_a = np.random.rand(5).astype(np.float32) # Create a random np array
np_b = np.random.rand(5).astype(np.float32) # Create a random np array
np_c = np.empty_like(np_a) # Create an empty destination array
cl_a = cl.Buffer(context, cl.mem_flags.COPY_HOST_PTR, hostbuf=np_a)
cl_b = cl.Buffer(context, cl.mem_flags.COPY_HOST_PTR, hostbuf=np_b)
cl_c = cl.Buffer(context, cl.mem_flags.WRITE_ONLY, np_c.nbytes)
# Create three buffers (plans for areas of memory on the device)
kernel = \
"""
__kernel void sum(__global float* a, __global float* b, __global float* c)
{
int i = get_global_id(0);
c[i] = a[i] + b[i];
}
""" # Create a kernel (a string containing C-like OpenCL device code)
program = cl.Program(context, kernel).build()
# Compile the kernel code into an executable OpenCL program
program.sum(queue, np_a.shape, None, cl_a, cl_b, cl_c)
# Enqueue the program for execution, causing data to be copied to the device
# - queue: the command queue the program will be sent to
# - np_a.shape: a tuple of the arrays' dimensions
# - cl_a, cl_b, cl_c: the memory spaces this program deals with
queue.finish()
np_arrays = [np_a, np_b, np_c]
cl_arrays = [cl_a, cl_b, cl_c]
for x in range(3):
cl.enqueue_copy(queue, cl_arrays[x], np_arrays[x])
# Copy the data for array c back to the host
arrd = {"a":np_a, "b":np_b, "c":np_c}
for k in arrd:
print k + ": ", arrd[k]
# Print all three host arrays, to show sum() worked
并且输出:
<pyopencl.Device 'Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz' on 'Apple' at 0xffffffff>
<pyopencl.Device 'Iris Pro' on 'Apple' at 0x1024500>
<pyopencl.Device 'AMD Radeon R9 M370X Compute Engine' on 'Apple' at 0x1021c00>
a: [ 0.44930401 0.77514887 0.28574091 0.24021916 0.3193087 ]
c: [ 0.0583559 0.85157514 0.80443901 0.09400933 0.87276274]
b: [ 0.81869799 0.49566364 0.85423696 0.68896079 0.95608395]
我对这里发生的事情的猜测是数据正在主机和设备之间正确复制,但内核没有被执行。据我从本教程和其他教程中了解到,代码应该足以执行内核。启动内核是否需要其他调用?我不确定这个例子使用的是哪个版本的 PyOpenCL,但我是 运行 2016.2
from conda-forge
on a Mac OS X.任何帮助非常感激。
您使用错误的参数顺序调用了 enqueue_copy。 你应该这样称呼它:
cl.enqueue_copy(queue, np_arrays[x], cl_arrays[x])
另一方面,您不需要复制回输入数组,因为您已经在主机上创建了它们。