使用 numba 找到两个向量之间的最小差异

Find minimum difference between two vectors with numba

我尝试使用 numba 优化搜索两个 numpy 向量之间的最小值。在我使用 prangeparallel=True 选项之前,速度会加快并且结果是正确的。我知道问题在于在并行执行期间共享变量 min_valtmpmin_val_idx_amin_val_idx_b(可能使用并行线程)。有没有办法克服这个问题并在 parallel=True 选项中使用 numba ? (这使我的简单代码快了 300 倍)

import numpy as np
from numba import jit, int32, void, double, prange

@jit(void(double[:], double[:], int32), nopython=True, parallel=True)
def lowest_value_numba(a, b, n): 
    
    # initialization
    min_val_idx_a, min_val_idx_b = 0, 0
    min_val = tmp = np.abs(a[0]-b[0])
    
    for i in prange(n):
#         print(i)
        for j in prange(i, n):
            tmp = np.abs(a[i]-b[j])
            
            if(tmp < min_val):
                min_val = tmp            
                min_val_idx_a = i
                min_val_idx_b = j
    print(min_val, min_val_idx_a, min_val_idx_b)
                
n = int(1e4)
a = np.random.uniform(low=0.0, high=1.0, size=n)
b = np.random.uniform(low=0.0, high=1.0, size=n)

# setting min value by setting the same valu efor a[n-1] and b[n-1] 
a[n-1], b[n-1] = 1, 1

%timeit -n 1 -r 1 lowest_value_numba(a, b, n)

错误的输出(应该是0.0 9999 9999):

0.23648058275546968 0 0
223 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

但是对于 parallel=False 输出的编译是正确的(最后的值彼此最接近):

0.0 9999 9999
65 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

如果按行并行化,则可以避免所评论的有关交叉迭代依赖性的问题。例如:

from numba import jit, int32, double, prange
from numba.types import Tuple

@jit(Tuple([double, int32, int32])(double[:], double[:], int32),
     nopython=True, parallel=True)
def lowest_value_numba(a, b, n):

    min_dif = np.empty_like(a)
    min_j = np.empty((n,), dtype=int32)

    for i in prange(n):
        diff = np.abs(a[i] - b)
        min_j[i] = j = np.argmin(diff)
        min_dif[i] = diff[j]

    i = np.argmin(min_dif)
    j = min_j[i]
    min_val = min_dif[i]

    return min_val, i, j

结果与您的实施(使用 parallel=Falsefor j in prange(n) 测试)和蛮力 Numpy 方法一致:

def lowest_value_numpy(a, b):
    diff = np.abs(np.atleast_2d(a).T - np.atleast_2d(b))
    indices = np.unravel_index(diff.argmin(), diff.shape)
    return diff[indices], *indices

为了完成@aerobiomat 的精彩回答,Numba 目前不支持 explicit/inferred min/max 并行缩减(在撰写本文时——对于 Numba 0.52)。 引用 documentation:

Binary operators: + - * / /? % | >> ^ << & ** // [...]
The user is required to make sure that the loop does not have cross iteration dependencies except for supported reductions. [...]
A reduction is inferred automatically if a variable is updated by a binary function/operator using its previous value in the loop body. [...]
The compiler may not detect such cases and then a race condition would occur.

因此,您的代码格式不正确并且出现竞争条件。

或者,您可以生成所有值的矩阵并使用隐式并行 np.argminparallel=True 但@aerobiomat 提供的解决方案应该更快(因为它不会产生很大的临时矩阵)。