Keras 和 Pytorch 之间不同的矩阵乘法行为

Question

我试图了解矩阵乘法在 DL 框架中如何在 2 维上工作，我偶然发现了一篇文章 here。他使用 Keras 来解释相同的内容，这对他很有用。但是当我尝试在 Pytorch 中重现相同的代码时，它失败并出现以下代码输出中的错误

Pytorch 代码：

a = torch.ones((2,3,4))
b = torch.ones((7,4,5))
c = torch.matmul(a,b)
print(c.shape)

Output: RuntimeError: The size of tensor a (2) must match the size of tensor b (7) at non-singleton dimension 0

Keras 代码：

a = K.ones((2,3,4))
b = K.ones((7,4,5))
c = K.dot(a,b)
print(c.shape)

Output:(2, 3, 7, 5)

谁能解释一下我做错了什么？

Answer 1

Matrix multiplication（又名矩阵点积）是一个定义明确的代数运算，采用两个二维矩阵。
深度学习框架（例如，tensorflow, keras, pytorch）被调整为操作 batches 矩阵，因此它们通常实现 batched 矩阵乘法，也就是说，将矩阵点积应用到 batch 二维矩阵。

您链接的示例显示了 matmul 如何处理批次矩阵：

a = tf.ones((9, 8, 7, 4, 2))
b = tf.ones((9, 8, 7, 2, 5))
c = tf.matmul(a, b)

请注意所有但最后两个维度相同 ((9, 8, 7)).

在您的示例中情况并非如此 - 前导（“批处理”）尺寸不同，因此出现错误。

在 pytorch 中使用相同的前导尺寸：

a = torch.ones((2,3,4))
b = torch.ones((2,4,5))
c = torch.matmul(a,b)
print(c.shape)

结果

torch.Size([2, 3, 5])

如果您坚持使用不同批量维度的点积，则必须明确定义如何将两个张量相乘。您可以使用非常灵活的 torch.einsum:

a = torch.ones((2,3,4))
b = torch.ones((7,4,5))
c = torch.einsum('ijk,lkm->ijlm', a, b)
print(c.shape)

结果为：

torch.Size([2, 3, 7, 5])

Keras 和 Pytorch 之间不同的矩阵乘法行为

Different Matrix multiplication behaviour between Keras and Pytorch

python

linear-algebra

keras

tensorflow

pytorch