TensorFlow：按元素进行隐式广播 addition/multiplication

Question

tensorflow 中使用 + 和 * 的隐式广播是如何工作的？

如果我有两个张量，这样

a.get_shape() = [64, 10, 1, 100]
b.get_shape() = [64, 100]
(a+b).get_shape = [64, 10, 64, 100]
(a*b).get_shape = [64, 10, 64, 100]

如何变成 [64, 10, 64, 100]？

Answer 1

根据documentation，像add这样的操作是广播操作。

引用 glossary:

Broadcasting operation

An operation that uses numpy-style broadcasting to make the shapes of its tensor arguments compatible.

numpy 风格的广播在 documentation 中有详细记录：

简述：

[...] the smaller array is “broadcast” across the larger array so that they have compatible shapes.

Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python.

Answer 2

我认为广播没有按您的预期进行。它实际上是双向广播。让我通过修改您的示例向您展示我的意思

a = tf.ones([64, 10, 1, 100])
b = tf.ones([128, 100])
print((a+b).shape) # prints "(64, 10, 128, 100)"

由此可见，它是先匹配最后一个维度进行广播的。它在第三个维度上隐式平铺 a 以匹配 b 第一维的大小，然后隐式添加单例并在 a 的前两个维度上平铺 b。

我认为您希望做的是在 a 的第二个维度上隐式平铺 b。为此，您需要 b 形状不同：

a = tf.ones([64, 10, 1, 100])
b = tf.ones([64, 1, 1, 100])
print((a+b).shape) # prints "(64, 10, 1, 100)"

您可以在 b 上使用两次 tf.expand_dims() 来添加两个单一维度以匹配此形状。

Answer 3

numpy 风格 broadcasting 有据可查，但给出一个简短的解释：将从最后一个形状开始向后比较 2 个张量的形状，然后任一张量中缺少的任何形状都会被复制以匹配。

例如，与

a.get_shape() = [64, 10, 1, 100]
b.get_shape() = [64, 100]
(a*b).get_shape = [64, 10, 64, 100]

a 和 b 具有相同的最后一个形状==100，然后复制 a 的倒数第二个形状以匹配 b 形状==64 , b 缺少 a 的前两个形状，将创建它们。

请注意，任何缺少的形状必须为 1 或不存在，因为整个较低级别的形状都被复制了。

TensorFlow：按元素进行隐式广播 addition/multiplication

TensorFlow: implicit broadcasting in element-wise addition/multiplication

tensorflow