如何在 Numba 中以每行线程为基础遍历 3D 矩阵?
How can I traverse a 3D matrix on a thread per row basis in Numba?
我正在尝试使用 Numba CUDA 实现基于运行的内核,我需要在每个线程的基础上逐行遍历 3D 矩阵的元素,即为每个线程分配一行,迭代该行的所有元素行。
例如,如果为了简单起见,我要使用具有 50 行和 100 列的 2D 矩阵,我将需要创建 50 个线程来遍历各自行的 100 个元素。
有人能告诉我怎么做吗?
原来其实很简单。您只需要启动与行一样多的线程,并让内核“指出”它的方向。
这是一个简单的内核,演示了如何在 3D 矩阵 (binary_image) 上进行这样的迭代。内核本身是我正在实现的 CCL 算法的一部分,但可以安全地忽略它:
from numba import cuda
@cuda.jit
def kernel_1(binary_image, image_width, s_matrix, labels_matrix):
# notice how we're only getting the row and depth of each thread
row, image_slice = cuda.grid(2)
sm_pos, lm_pos = 0, 0
span_found = False
if row < binary_image.shape[0] and image_slice < binary_image.shape[2]: # guard for rows and slices
# and here's the traversing over the columns
for column in range(binary_image.shape[1]):
if binary_image[row, column, image_slice] == 0:
if not span_found: # Connected Component found
span_found = True
s_matrix[row, sm_pos, image_slice] = column
sm_pos = sm_pos + 1
# converting 2D coordinate to 1D
linearized_index = row * image_width + column
labels_matrix[row, lm_pos, image_slice] = linearized_index
lm_pos = lm_pos + 1
else:
s_matrix[row, sm_pos, image_slice] = column
elif binary_image[row, column, image_slice] == 255 and span_found:
span_found = False
s_matrix[row, sm_pos, image_slice] = column - 1
sm_pos = sm_pos + 1
我正在尝试使用 Numba CUDA 实现基于运行的内核,我需要在每个线程的基础上逐行遍历 3D 矩阵的元素,即为每个线程分配一行,迭代该行的所有元素行。
例如,如果为了简单起见,我要使用具有 50 行和 100 列的 2D 矩阵,我将需要创建 50 个线程来遍历各自行的 100 个元素。
有人能告诉我怎么做吗?
原来其实很简单。您只需要启动与行一样多的线程,并让内核“指出”它的方向。 这是一个简单的内核,演示了如何在 3D 矩阵 (binary_image) 上进行这样的迭代。内核本身是我正在实现的 CCL 算法的一部分,但可以安全地忽略它:
from numba import cuda
@cuda.jit
def kernel_1(binary_image, image_width, s_matrix, labels_matrix):
# notice how we're only getting the row and depth of each thread
row, image_slice = cuda.grid(2)
sm_pos, lm_pos = 0, 0
span_found = False
if row < binary_image.shape[0] and image_slice < binary_image.shape[2]: # guard for rows and slices
# and here's the traversing over the columns
for column in range(binary_image.shape[1]):
if binary_image[row, column, image_slice] == 0:
if not span_found: # Connected Component found
span_found = True
s_matrix[row, sm_pos, image_slice] = column
sm_pos = sm_pos + 1
# converting 2D coordinate to 1D
linearized_index = row * image_width + column
labels_matrix[row, lm_pos, image_slice] = linearized_index
lm_pos = lm_pos + 1
else:
s_matrix[row, sm_pos, image_slice] = column
elif binary_image[row, column, image_slice] == 255 and span_found:
span_found = False
s_matrix[row, sm_pos, image_slice] = column - 1
sm_pos = sm_pos + 1