在 cuda 编译的内核中对 1d NumPy 数组的值求和会导致错误,但文档说它支持吗?

Summing values on 1d NumPy array inside a cuda-compiled kernel results in error, but doc says its supported?

我正在尝试对 CUDA 复杂的 Numba 函数中的数组值求和。

我有一个简单的测试代码:

import numpy as np
from numba import cuda

values = np.zeros(100, dtype=np.float64)
values.fill(1)


@cuda.jit
def try_to_sum(arr):
    print(arr.sum())


d_values = cuda.to_device(values)
cuda.synchronize()

try_to_sum[1, 1](d_values)

文档说这是一个受支持的函数:

但它失败了:

Failed in nopython mode pipeline (step: nopython frontend)
Use of unsupported NumPy function 'numpy.nditer' or unsupported use of the function.

File "../../anaconda3/envs/GpuVM/lib/python3.8/site-packages/numba/np/arraymath.py", line 167:
    def array_sum_impl(arr):
        <source elided>
        c = zero
        for v in np.nditer(arr):
        ^

During: typing of get attribute at /home/stark/anaconda3/envs/GpuVM/lib/python3.8/site-packages/numba/np/arraymath.py (167)

File "../../anaconda3/envs/GpuVM/lib/python3.8/site-packages/numba/np/arraymath.py", line 167:
    def array_sum_impl(arr):
        <source elided>
        c = zero
        for v in np.nditer(arr):
        ^

During: lowering "call_method.3 = call load_method.2(func=load_method.2, args=[], kws=(), vararg=None)" at /home/stark/Work/mmr-evolution-gpu/xtests.py (10)

我也试过使用 np.cumsum(arr),但失败了:

numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Use of unsupported NumPy function 'numpy.cumsum' or unsupported use of the function.

File "xtests.py", line 10:
def try_to_sum(arr):
    np.cumsum(arr)
    ^

如何在 CUDA 内核中对包含 float64 值的一维数组进行简单求和?

谢谢!

啊,我正在查看 CPU 文档...CUDA 部分声明不支持数组函数。所以这个是不支持的。为了快速完成任务,我必须将总和与数组一起保存在标量变量中。

我会把它留在这里以防它对其他人有帮助。