`np.dot` 剩余轴上没有笛卡尔积
`np.dot` without cartesian product on remaining axes
根据 documentation:
For N dimensions dot
is a sum product over the last axis of a
and the second-to-last of b
:
dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])
我想计算 a
的最后一个轴和 b
的倒数第二个轴的和积,但不在剩余轴上形成笛卡尔积,因为剩余轴形状相同。让我用一个例子来说明:
a = np.random.normal(size=(11, 12, 13))
b = np.random.normal(size=(11, 12, 13, 13))
c = np.dot(a, b)
c.shape # = (11, 12, 11, 12, 13)
但我希望形状是 (11, 12, 13)
。使用broadcasting
可以达到想要的效果
c = np.sum(a[..., None] * b, axis=-2)
c.shape # = (11, 12, 13)
但是我的数组相对较大,我想使用 np.sum
似乎不支持但 np.dot
支持的并行 BLAS 实现的强大功能。关于如何实现这一点有什么想法吗?
您可以使用 np.einsum
-
c = np.einsum('ijk,ijkl->ijl',a,b)
你也可以使用np.matmul
:
c = np.matmul(a[..., None, :], b)[..., 0, :]
这相当于 Python 3.5+ 中的 the new @
operator:
c = (a[..., None, :] @ b)[..., 0, :]
性能没有太大差异 - 如果有的话 np.einsum
对于您的示例数组来说似乎稍微快一些:
In [1]: %%timeit a = np.random.randn(11, 12, 13); b = np.random.randn(11, 12, 13, 13)
....: np.einsum('...i,...ij->...j', a, b)
....:
The slowest run took 5.24 times longer than the fastest. This could mean that an
intermediate result is being cached.
10000 loops, best of 3: 26.7 µs per loop
In [2]: %%timeit a = np.random.randn(11, 12, 13); b = np.random.randn(11, 12, 13, 13)
np.matmul(a[..., None, :], b)[..., 0, :]
....:
10000 loops, best of 3: 28 µs per loop
根据 documentation:
For N dimensions
dot
is a sum product over the last axis ofa
and the second-to-last ofb
:dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])
我想计算 a
的最后一个轴和 b
的倒数第二个轴的和积,但不在剩余轴上形成笛卡尔积,因为剩余轴形状相同。让我用一个例子来说明:
a = np.random.normal(size=(11, 12, 13))
b = np.random.normal(size=(11, 12, 13, 13))
c = np.dot(a, b)
c.shape # = (11, 12, 11, 12, 13)
但我希望形状是 (11, 12, 13)
。使用broadcasting
c = np.sum(a[..., None] * b, axis=-2)
c.shape # = (11, 12, 13)
但是我的数组相对较大,我想使用 np.sum
似乎不支持但 np.dot
支持的并行 BLAS 实现的强大功能。关于如何实现这一点有什么想法吗?
您可以使用 np.einsum
-
c = np.einsum('ijk,ijkl->ijl',a,b)
你也可以使用np.matmul
:
c = np.matmul(a[..., None, :], b)[..., 0, :]
这相当于 Python 3.5+ 中的 the new @
operator:
c = (a[..., None, :] @ b)[..., 0, :]
性能没有太大差异 - 如果有的话 np.einsum
对于您的示例数组来说似乎稍微快一些:
In [1]: %%timeit a = np.random.randn(11, 12, 13); b = np.random.randn(11, 12, 13, 13)
....: np.einsum('...i,...ij->...j', a, b)
....:
The slowest run took 5.24 times longer than the fastest. This could mean that an
intermediate result is being cached.
10000 loops, best of 3: 26.7 µs per loop
In [2]: %%timeit a = np.random.randn(11, 12, 13); b = np.random.randn(11, 12, 13, 13)
np.matmul(a[..., None, :], b)[..., 0, :]
....:
10000 loops, best of 3: 28 µs per loop