如何将 python 函数 "any()" 转换为 CUDA python 兼容代码（运行 on GPU）？

Question

我想知道如何在 GPU 上实现 numpy 函数 any()（使用 Numba python）。如果输入的至少一个元素的计算结果为 True.

，则 any() 函数采用数组和 returns True

类似于：

@vectorize(["boolean(boolean)"], target='cuda')
def AnyFunction(a):
    return any(a)

或

@vectorize(["boolean(boolean)"], target='cuda')
def AnyFunction(a):
    for i in range(len(a)):
        if a[i]==True:
            return True
    return False

Answer 1

any 函数操作的更困难的方面（也许）是减少方面。 true/false 的每个项目的测试是一个可以很容易地完成的操作，例如vectorize，但是很多结果组合成一个值（归约方面）不能（随便）；事实上 vectorize 并不是为了解决这类问题而设计的，至少不是直接解决的。

但是 numba cuda 提供了一些 help 来解决简单的归约问题（比如这个），而不是强迫您编写自定义的 numba cuda 内核。

这是一种可能的方法：

$ cat t20.py
import numpy
from numba import cuda

@cuda.reduce
def or_reduce(a, b):
    return a or b

A = numpy.ones(1000, dtype=numpy.int32)
B = numpy.zeros(1000, dtype=numpy.int32)
expect = A.any()      # numpy reduction
got = or_reduce(A)   # cuda reduction
print expect
print got
expect = B.any()      # numpy reduction
got = or_reduce(B)   # cuda reduction
print expect
print got
B[100] = 1
expect = B.any()      # numpy reduction
got = or_reduce(B)   # cuda reduction
print expect
print got

$ python t20.py
True
1
False
0
True
1
$

关于性能的几点评论：

这可能不是执行此操作的最快方法。但是我从你的问题中得到的印象是你正在寻找接近普通的东西 python.
写一个 custom CUDA kernel in numba 可能会更快地完成这项工作。
如果您对性能很在意，那么建议尝试将此操作与要在 GPU 上完成的其他工作结合起来。在这种情况下，为了获得最大的灵活性，自定义内核将使您更有能力以最高性能完成任务。

如何将 python 函数 "any()" 转换为 CUDA python 兼容代码（运行 on GPU）？

How to convert the python function "any()" to CUDA python compatible code(running on GPU)?

python

cuda

numpy

numba