使用 numba 找到两个向量之间的最小差异
Find minimum difference between two vectors with numba
我尝试使用 numba
优化搜索两个 numpy
向量之间的最小值。在我使用 prange
和 parallel=True
选项之前,速度会加快并且结果是正确的。我知道问题在于在并行执行期间共享变量 min_val
、tmp
、min_val_idx_a
、min_val_idx_b
(可能使用并行线程)。有没有办法克服这个问题并在 parallel=True
选项中使用 numba
? (这使我的简单代码快了 300 倍)
import numpy as np
from numba import jit, int32, void, double, prange
@jit(void(double[:], double[:], int32), nopython=True, parallel=True)
def lowest_value_numba(a, b, n):
# initialization
min_val_idx_a, min_val_idx_b = 0, 0
min_val = tmp = np.abs(a[0]-b[0])
for i in prange(n):
# print(i)
for j in prange(i, n):
tmp = np.abs(a[i]-b[j])
if(tmp < min_val):
min_val = tmp
min_val_idx_a = i
min_val_idx_b = j
print(min_val, min_val_idx_a, min_val_idx_b)
n = int(1e4)
a = np.random.uniform(low=0.0, high=1.0, size=n)
b = np.random.uniform(low=0.0, high=1.0, size=n)
# setting min value by setting the same valu efor a[n-1] and b[n-1]
a[n-1], b[n-1] = 1, 1
%timeit -n 1 -r 1 lowest_value_numba(a, b, n)
错误的输出(应该是0.0 9999 9999
):
0.23648058275546968 0 0
223 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
但是对于 parallel=False
输出的编译是正确的(最后的值彼此最接近):
0.0 9999 9999
65 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
如果按行并行化,则可以避免所评论的有关交叉迭代依赖性的问题。例如:
from numba import jit, int32, double, prange
from numba.types import Tuple
@jit(Tuple([double, int32, int32])(double[:], double[:], int32),
nopython=True, parallel=True)
def lowest_value_numba(a, b, n):
min_dif = np.empty_like(a)
min_j = np.empty((n,), dtype=int32)
for i in prange(n):
diff = np.abs(a[i] - b)
min_j[i] = j = np.argmin(diff)
min_dif[i] = diff[j]
i = np.argmin(min_dif)
j = min_j[i]
min_val = min_dif[i]
return min_val, i, j
结果与您的实施(使用 parallel=False
和 for j in prange(n)
测试)和蛮力 Numpy 方法一致:
def lowest_value_numpy(a, b):
diff = np.abs(np.atleast_2d(a).T - np.atleast_2d(b))
indices = np.unravel_index(diff.argmin(), diff.shape)
return diff[indices], *indices
为了完成@aerobiomat 的精彩回答,Numba 目前不支持 explicit/inferred min/max 并行缩减(在撰写本文时——对于 Numba 0.52)。
引用 documentation:
Binary operators: + - * / /? % | >> ^ << & ** //
[...]
The user is required to make sure that the loop does not have cross iteration dependencies except for supported reductions. [...]
A reduction is inferred automatically if a variable is updated by a binary function/operator using its previous value in the loop body. [...]
The compiler may not detect such cases and then a race condition would occur.
因此,您的代码格式不正确并且出现竞争条件。
或者,您可以生成所有值的矩阵并使用隐式并行 np.argmin
和 parallel=True
但@aerobiomat 提供的解决方案应该更快(因为它不会产生很大的临时矩阵)。
我尝试使用 numba
优化搜索两个 numpy
向量之间的最小值。在我使用 prange
和 parallel=True
选项之前,速度会加快并且结果是正确的。我知道问题在于在并行执行期间共享变量 min_val
、tmp
、min_val_idx_a
、min_val_idx_b
(可能使用并行线程)。有没有办法克服这个问题并在 parallel=True
选项中使用 numba
? (这使我的简单代码快了 300 倍)
import numpy as np
from numba import jit, int32, void, double, prange
@jit(void(double[:], double[:], int32), nopython=True, parallel=True)
def lowest_value_numba(a, b, n):
# initialization
min_val_idx_a, min_val_idx_b = 0, 0
min_val = tmp = np.abs(a[0]-b[0])
for i in prange(n):
# print(i)
for j in prange(i, n):
tmp = np.abs(a[i]-b[j])
if(tmp < min_val):
min_val = tmp
min_val_idx_a = i
min_val_idx_b = j
print(min_val, min_val_idx_a, min_val_idx_b)
n = int(1e4)
a = np.random.uniform(low=0.0, high=1.0, size=n)
b = np.random.uniform(low=0.0, high=1.0, size=n)
# setting min value by setting the same valu efor a[n-1] and b[n-1]
a[n-1], b[n-1] = 1, 1
%timeit -n 1 -r 1 lowest_value_numba(a, b, n)
错误的输出(应该是0.0 9999 9999
):
0.23648058275546968 0 0
223 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
但是对于 parallel=False
输出的编译是正确的(最后的值彼此最接近):
0.0 9999 9999
65 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
如果按行并行化,则可以避免所评论的有关交叉迭代依赖性的问题。例如:
from numba import jit, int32, double, prange
from numba.types import Tuple
@jit(Tuple([double, int32, int32])(double[:], double[:], int32),
nopython=True, parallel=True)
def lowest_value_numba(a, b, n):
min_dif = np.empty_like(a)
min_j = np.empty((n,), dtype=int32)
for i in prange(n):
diff = np.abs(a[i] - b)
min_j[i] = j = np.argmin(diff)
min_dif[i] = diff[j]
i = np.argmin(min_dif)
j = min_j[i]
min_val = min_dif[i]
return min_val, i, j
结果与您的实施(使用 parallel=False
和 for j in prange(n)
测试)和蛮力 Numpy 方法一致:
def lowest_value_numpy(a, b):
diff = np.abs(np.atleast_2d(a).T - np.atleast_2d(b))
indices = np.unravel_index(diff.argmin(), diff.shape)
return diff[indices], *indices
为了完成@aerobiomat 的精彩回答,Numba 目前不支持 explicit/inferred min/max 并行缩减(在撰写本文时——对于 Numba 0.52)。 引用 documentation:
Binary operators:
+ - * / /? % | >> ^ << & ** //
[...]
The user is required to make sure that the loop does not have cross iteration dependencies except for supported reductions. [...]
A reduction is inferred automatically if a variable is updated by a binary function/operator using its previous value in the loop body. [...]
The compiler may not detect such cases and then a race condition would occur.
因此,您的代码格式不正确并且出现竞争条件。
或者,您可以生成所有值的矩阵并使用隐式并行 np.argmin
和 parallel=True
但@aerobiomat 提供的解决方案应该更快(因为它不会产生很大的临时矩阵)。