numpy dot() 和 Python 3.5+矩阵乘法的区别 @

Question

我最近转到 Python 3.5 并注意到 new matrix multiplication operator (@) sometimes behaves differently from the numpy dot 运算符。例如，对于 3d 数组：

import numpy as np

a = np.random.rand(8,13,13)
b = np.random.rand(8,13,13)
c = a @ b  # Python 3.5+
d = np.dot(a, b)

@运算符returns一个形状数组：

c.shape
(8, 13, 13)

而 np.dot() 函数 returns:

d.shape
(8, 13, 8, 13)

如何使用 numpy dot 重现相同的结果？还有其他显着差异吗？

Answer 1

@ 运算符调用数组的 __matmul__ 方法，而不是 dot。此方法也作为函数 np.matmul.

出现在 API 中

>>> a = np.random.rand(8,13,13)
>>> b = np.random.rand(8,13,13)
>>> np.matmul(a, b).shape
(8, 13, 13)

来自文档：

matmul differs from dot in two important ways.

Multiplication by scalars is not allowed.

Stacks of matrices are broadcast together as if the matrices were elements.

最后一点清楚地表明，dot 和 matmul 方法在传递 3D（或更高维）数组时表现不同。从文档中引用更多内容：

对于matmul：

If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast accordingly.

对于np.dot：

For 2-D arrays it is equivalent to matrix multiplication, and for 1-D arrays to inner product of vectors (without complex conjugation). For N dimensions it is a sum product over the last axis of a and the second-to-last of b

Answer 2

@ajcr 的回答解释了 dot 和 matmul（由 @ 符号调用）有何不同。通过看一个简单的例子，可以清楚地看到两者在 'stacks of matricies' 或张量上运行时的不同表现。

为了阐明差异，采用 4x4 数组和 return dot 乘积，matmul 乘积采用 3x4x2 'stack of matricies' 或张量。

import numpy as np
fourbyfour = np.array([
                       [1,2,3,4],
                       [3,2,1,4],
                       [5,4,6,7],
                       [11,12,13,14]
                      ])


threebyfourbytwo = np.array([
                             [[2,3],[11,9],[32,21],[28,17]],
                             [[2,3],[1,9],[3,21],[28,7]],
                             [[2,3],[1,9],[3,21],[28,7]],
                            ])

print('4x4*3x4x2 dot:\n {}\n'.format(np.dot(fourbyfour,threebyfourbytwo)))
print('4x4*3x4x2 matmul:\n {}\n'.format(np.matmul(fourbyfour,threebyfourbytwo)))

每个操作的产品如下所示。注意点积是怎样的，

...a sum product over the last axis of a and the second-to-last of b

以及矩阵乘积是如何通过一起广播矩阵形成的。

4x4*3x4x2 dot:
 [[[232 152]
  [125 112]
  [125 112]]

 [[172 116]
  [123  76]
  [123  76]]

 [[442 296]
  [228 226]
  [228 226]]

 [[962 652]
  [465 512]
  [465 512]]]

4x4*3x4x2 matmul:
 [[[232 152]
  [172 116]
  [442 296]
  [962 652]]

 [[125 112]
  [123  76]
  [228 226]
  [465 512]]

 [[125 112]
  [123  76]
  [228 226]
  [465 512]]]

Answer 3

在数学方面，我认为 numpy 中的点更有意义

dot(a,b)_{i,j,k,a,b,c} = $\sum_m a_{i,j,k,m}b_{a,b,m,c}$

因为当a和b是向量时它给出点积，或者当a和b是矩阵时它给出矩阵乘法

对于numpy中的matmul运算，由dot部分结果组成，可以定义为

>matmul(a,b)_{i,j,k,c} = $\sum_m a_{i,j,k,m}b_{i,j,m,c}$

所以，你可以看到 matmul(a,b) returns 一个小形状的数组，它具有更小的内存消耗并且在应用程序中更有意义。特别是结合broadcasting，可以得到

matmul(a,b)_{i,j,k,l} = $\sum_m a_{i,j,k,m}b_{j,m,l}$

例如。

从以上两个定义，可以看出使用这两个操作的要求。假设 a.shape=(s1,s2,s3,s4) 和 b.shape=(t1,t2,t3,t4)

要使用 dot(a,b) 你需要
1. t3=s4;
要使用 matmul(a,b) 你需要
1. t3=s4
2. t2=s2，或者t2和s2其中之一为1
3. t1=s1，或者t1和s1其中之一为1

用下面的一段代码来说服自己。

代码示例

import numpy as np
for it in xrange(10000):
    a = np.random.rand(5,6,2,4)
    b = np.random.rand(6,4,3)
    c = np.matmul(a,b)
    d = np.dot(a,b)
    #print 'c shape: ', c.shape,'d shape:', d.shape

    for i in range(5):
        for j in range(6):
            for k in range(2):
                for l in range(3):
                    if not c[i,j,k,l] == d[i,j,k,j,l]:
                        print it,i,j,k,l,c[i,j,k,l]==d[i,j,k,j,l] #you will not see them

Answer 4

仅供参考，@ 及其 numpy 等价物 dot 和 matmul 都同样快。（用 perfplot 创建的情节，我的一个项目。）

重现情节的代码：

import perfplot
import numpy


def setup(n):
    A = numpy.random.rand(n, n)
    x = numpy.random.rand(n)
    return A, x


def at(data):
    A, x = data
    return A @ x


def numpy_dot(data):
    A, x = data
    return numpy.dot(A, x)


def numpy_matmul(data):
    A, x = data
    return numpy.matmul(A, x)


perfplot.show(
    setup=setup,
    kernels=[at, numpy_dot, numpy_matmul],
    n_range=[2 ** k for k in range(15)],
)

Answer 5

我对 MATMUL 和 DOT 的体验

我在尝试使用 MATMUL 时不断得到 "ValueError: Shape of passed values is (200, 1), indices imply (200, 3)"。我想要一个快速的解决方法，并发现 DOT 可以提供相同的功能。使用 DOT 时我没有收到任何错误。我得到正确答案

使用 MATMUL

X.shape
>>>(200, 3)

type(X)

>>>pandas.core.frame.DataFrame

w

>>>array([0.37454012, 0.95071431, 0.73199394])

YY = np.matmul(X,w)

>>>  ValueError: Shape of passed values is (200, 1), indices imply (200, 3)"

带点

YY = np.dot(X,w)
# no error message
YY
>>>array([ 2.59206877,  1.06842193,  2.18533396,  2.11366346,  0.28505879, …

YY.shape

>>> (200, )

Answer 6

这是与 np.einsum 的比较，以显示指数的预测方式

np.allclose(np.einsum('ijk,ijk->ijk', a,b), a*b)        # True 
np.allclose(np.einsum('ijk,ikl->ijl', a,b), a@b)        # True
np.allclose(np.einsum('ijk,lkm->ijlm',a,b), a.dot(b))   # True

numpy dot() 和 Python 3.5+矩阵乘法的区别 @

Difference between numpy dot() and Python 3.5+ matrix multiplication @

python

numpy

matrix-multiplication

python-3.5

>matmul(a,b)_{i,j,k,c} = $\sum_m a_{i,j,k,m}b_{i,j,m,c}$

代码示例