如何释放 GPU 内存并对 Pyopencl 中的不同数组使用相同的缓冲区？

Question

以下是我的工作代码供参考：

vector = numpy.array([1, 2, 4, 8], numpy.float32) #cl.array.vec.float4
matrix = numpy.zeros((1, 4), cl.array.vec.float4)
matrix[0, 0] = (1, 2, 4, 8)
matrix[0, 1] = (16, 32, 64, 128)
matrix[0, 2] = (3, 6, 9, 12)
matrix[0, 3] = (5, 10, 15, 25)
# vector[0] = (1, 2, 4, 8)


platform=cl.get_platforms() #gets all platforms that exist on this machine
device=platform[0].get_devices(device_type=cl.device_type.GPU) #gets all GPU's that exist on first platform from platform list
context=cl.Context(devices=[device[0]]) #Creates context for all devices in the list of "device" from above. context.num_devices give number of devices in this context
print("everything good so far")
program=cl.Program(context,"""
__kernel void matrix_dot_vector(__global const float4 * matrix,__global const float *vector,__global float *result)
{
int gid = get_global_id(0);

result[gid]=dot(matrix[gid],vector[0]);
}

""" ).build()
queue=cl.CommandQueue(context)
# queue=cl.CommandQueue(context,cl_device_id device) #Context specific to a device if we plan on using multiple GPUs for parallel processing

mem_flags = cl.mem_flags
matrix_buf = cl.Buffer(context, mem_flags.READ_ONLY | mem_flags.COPY_HOST_PTR, hostbuf=matrix)
vector_buf = cl.Buffer(context, mem_flags.READ_ONLY | mem_flags.COPY_HOST_PTR, hostbuf=vector)
matrix_dot_vector = numpy.zeros(4, numpy.float32)
global_size_of_GPU= 0
destination_buf = cl.Buffer(context, mem_flags.WRITE_ONLY, matrix_dot_vector.nbytes)
# threads_size_buf = cl.Buffer(context, mem_flags.WRITE_ONLY, global_size_of_GPU.nbytes)
program.matrix_dot_vector(queue, matrix_dot_vector.shape, None, matrix_buf, vector_buf, destination_buf)

## Step #11. Move the kernel’s output data to host memory.
cl.enqueue_copy(queue, matrix_dot_vector, destination_buf)
# cl.enqueue_copy(queue, global_size_of_GPU, threads_size_buf)
print(matrix_dot_vector)
# print(global_size_of_GPU)

# COPY SAME ARRAY FROM GPU AGAIN
cl.enqueue_copy(queue, matrix_dot_vector, destination_buf)
print(matrix_dot_vector)
print('copied same array twice')

如何在 GPU 上释放 matrix_buf 和 destination_buf 中的内存。一个是只读的，一个是只写的。
如何在同一个 matrix_buf 中加载不同的矩阵数组，而不用必须在 pyopencl 中创建新缓冲区。我读到如果我加载新的同一缓冲区中的数据比重新创建相同大小快得多每次缓冲。
如果我在旧缓冲区中加载新数组可以吗尺寸小于该缓冲区中的旧数组。做新数组必须与缓冲区的大小完全相同？

Answer 1

回复 1。我相信当缓冲区的变量超出范围时缓冲区将被释放，或者您可以显式调用 release()。在这种情况下，缓冲区是只读还是只写并不重要。
回复 2. 尝试 pyopencl.enqueue_map_buffer() 其中 returns 访问可以从主机端修改的数组。更多 here.
Re 3.如果你想重用现有的缓冲区并使用它的一部分就可以了。在内核方面，您可以控制要访问的部分。

Answer 2

matrix_buf.release() & destination_buf.release() - 这将释放为 GPU 中的各个缓冲区分配的内存。内存没用了最好释放掉，避免运行 into memory 错误。如果 GPU 函数退出，所有 GPU 内存都会被 pyopencl 自动清除。 -{作者：doqtor}
cl.enqueue_copy(queue, matrix_buf, matrix_2) - 将新的 matrix_2 数组加载到 matrix_buf 中，而无需重新创建新的矩阵 buf。
重用现有缓冲区并使用其中的一部分是可以的。在内核方面，我们可以控制要访问的部分。 -{作者：doqtor}

如何释放 GPU 内存并对 Pyopencl 中的不同数组使用相同的缓冲区？

How to release GPU memory & use same buffer for different array in Pyopencl?

pyopencl