numpy 元素外积

numpy elementwise outer product

我想在 numpy 中对两个二维数组进行逐元素外积。

A.shape = (100, 3) # A numpy ndarray
B.shape = (100, 5) # A numpy ndarray

C = element_wise_outer_product(A, B) # A function that does the trick
C.shape = (100, 3, 5) # This should be the result
C[i] = np.outer(A[i], B[i]) # This should be the result

一个天真的实现可以如下。

tmp = []
for i in range(len(A):
    outer_product = np.outer(A[i], B[i])
    tmp.append(outer_product)
C = np.array(tmp)

一个更好的解决方案,灵感来自堆栈溢出。

big_outer = np.multiply.outer(A, B)
tmp = np.swapaxes(tmp, 1, 2)
C_tmp = [tmp[i][i] for i in range(len(A)]
C = np.array(C_tmp)

我正在寻找摆脱 for 循环的矢量化实现。 有人有想法吗? 谢谢!

AB 扩展到 3D 保持它们的第一个轴对齐并分别沿第三个和第二个轴引入新轴 None/np.newaxis and then multiply with each other. This would allow broadcasting 以发挥作用矢量化解决方案。

因此,一个实现将是 -

A[:,:,None]*B[:,None,:]

我们可以通过对 A 使用 ellipsis 来稍微缩短它::,: 并跳过使用 B 列出剩余的最后一个轴,就像这样 -

A[...,None]*B[:,None]

作为另一种向量化方法,我们也可以使用 np.einsum,一旦我们通过字符串符号语法并将这些符号视为涉及朴素循环实现的迭代器的代表,这可能会更直观,就像这样-

np.einsum('ij,ik->ijk',A,B)

另一个解决方案使用 np.lib.stride_tricks.as_strided()..

这里的策略实质上是构建一个 (100, 3, 5) 数组 As 和一个 (100, 3, 5) 数组 Bs 这样正常的 element-wise 产品这些阵列将产生所需的结果。当然,由于 as_strided(),我们 实际上 构建大内存消耗数组。 (as_strided() 就像一个蓝图,它告诉 NumPy 如何 将原始数组中的数据映射到构造 AsBs。)

def outer_prod_stride(A, B):
    """stride trick"""
    a = A.shape[-1]
    b = B.shape[-1]
    d = A.strides[-1]
    new_shape = A.shape + (b,)
    As = np.lib.stride_tricks.as_strided(A, shape=new_shape, strides=(a*d, d, 0))
    Bs = np.lib.stride_tricks.as_strided(B, shape=new_shape, strides=(b*d, 0, d))
    return As * Bs

时间

def outer_prod_broadcasting(A, B):
    """Broadcasting trick"""
    return A[...,None]*B[:,None]

def outer_prod_einsum(A, B):
    """einsum() trick"""
    return np.einsum('ij,ik->ijk',A,B)

def outer_prod_stride(A, B):
    """stride trick"""
    a = A.shape[-1]
    b = B.shape[-1]
    d = A.strides[-1]
    new_shape = A.shape + (b,)
    As = np.lib.stride_tricks.as_strided(A, shape=new_shape, strides=(a*d, d, 0))
    Bs = np.lib.stride_tricks.as_strided(B, shape=new_shape, strides=(b*d, 0, d))
    return As * Bs

%timeit op1 = outer_prod_broadcasting(A, B)
2.54 µs ± 436 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit op2 = outer_prod_einsum(A, B)
3.03 µs ± 637 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit op3 = outer_prod_stride(A, B)
16.6 µs ± 5.39 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

似乎我的步幅技巧解决方案比@Divkar 的两个解决方案都慢。 ..仍然是一个值得了解的有趣方法。