如何在 python 中向量化这样的算法?

How to vectorize such an algorithm in python?

我有四个 (nx1) 维数组,分别命名为 a、b、c 和 F。我想 运行 这个算法没有任何循环。

for i in range(n):
    if a[i] < b[i]:
        F[i] = 0
    elif a[i] > c[i]:
        F[i] = 1
    elif b[i] <= a[i] <= c[i]:
        F[i] = 2

我想以更加矢量化的方式编写这段代码,以使其更加高效,因为我的数据集非常大。

我觉得你可以使用布尔索引来完成这项任务。

F[np.logical_and(b <= a, a <= c)] = 2
F[a > c] = 1
F[a < b] = 0

请注意,这里的矫揉造作顺序对于获得预期结果很重要。

一些timeit基准:

def loop(F, a, b, c):
  for i in range(F.shape[0]):
    if a[i] < b[i]:
      F[i] = 0
    elif a[i] > c[i]:
      F[i] = 1
    elif b[i] <= a[i] <= c[i]:
      F[i] = 2

def idx(F, a, b, c):
  F[np.logical_and(b <= a, a <= c)] = 2
  F[a > c] = 1
  F[a < b] = 0

(10x1)数组:

>>> timeit.timeit(lambda: loop(F, a, b, c))
11.585818066001593
>>> timeit.timeit(lambda: idx(F, a, b, c))
3.337863392000145

(1000x1)数组:

>>> timeit.timeit(lambda: loop(F, a, b, c))
1457.268110728999
>>> timeit.timeit(lambda: idx(F, a, b, c))
10.00236530300026

如果您关心性能,为什么不试试 numba?它可能比逻辑运算快 10X,同时节省内存。作为奖励,您编写的循环代码将保持完整,仅通过函数前面的 @njit 装饰器。

from numba import njit

@njit
def loop(F, a, b, c):
  for i in range(F.shape[0]):
    if a[i] < b[i]:
      F[i] = 0
    elif a[i] < c[i]:
      F[i] = 1
    elif b[i] <= a[i] <= c[i]:
      F[i] = 2

与@NiziL 使用 100 和 1000 个向量大小的向量化解决方案进行比较,

timeit(lambda: loop(F, a, b, c))
timeit(lambda: idx(F, a, b, c))

给出:

# 1.0355658 (Size: 100, @njit loop)
# 4.6863165 (Size: 100, idx)

# 1.9563843 (Size: 1000, @njit loop)
# 16.658198 (Size: 1000, idx)