带有大过滤器的 tensorflow conv2d 的内存使用

Question

我有一个带有一些相对较大 135 x 135 x 1 x 3 卷积滤波器的张量流模型。我发现 tf.nn.conv2d 变得无法用于如此大的过滤器 - 它试图使用超过 60GB 的内存，此时我需要杀死它。这是重现我的错误的最小脚本：

import tensorflow as tf
import numpy as np

frames, height, width, channels = 200, 321, 481, 1
filter_h, filter_w, filter_out = 5, 5, 3  # With this, output has shape (200, 317, 477, 3)
# filter_h, filter_w, filter_out = 7, 7, 3  # With this, output has shape (200, 315, 475, 3)
# filter_h, filter_w, filter_out = 135, 135, 3  # With this, output will be smaller than the above with shape (200, 187, 347, 3), but memory usage explodes

images = np.random.randn(frames, height, width, channels).astype(np.float32)

filters = tf.Variable(np.random.randn(filter_h, filter_w, channels, filter_out).astype(np.float32))
images_input = tf.placeholder(tf.float32)
conv = tf.nn.conv2d(images_input, filters, strides=[1, 1, 1, 1], padding="VALID")

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    result = sess.run(conv, feed_dict={images_input: images})

print result.shape

首先，谁能解释一下这种行为？为什么内存使用会随着过滤器大小而爆炸？（注意：我也尝试改变尺寸以使用单个 conv3d 而不是一批 conv2d，但这有同样的问题）

其次，除了将操作分解为 200 个单独的单图像卷积之外，还有谁能提出解决方案吗？

编辑： 重读 tf.nn.conv2d() 上的 docs 后，我在解释其工作原理时注意到了这一点：

Flattens the filter to a 2-D matrix with shape [filter_height * filter_width * in_channels, output_channels].

Extracts image patches from the input tensor to form a virtual tensor of shape [batch, out_height, out_width, filter_height * filter_width * in_channels].

For each patch, right-multiplies the filter matrix and the image patch vector.

我最初只是将其作为过程的描述，但如果 tensorflow 实际上是从图像中提取和存储单独的过滤器大小 'patches'引擎盖，然后粗略计算表明，在我的情况下，涉及的中间计算需要 ~130GB，远远超过我可以测试的限制。这可能回答我的第一个问题问题，但如果可以的话，谁能解释为什么当我仍然只在 CPU 上调试时 TF 会这样做？

Answer 1

I had originally taken this simply as a description of the process, but if tensorflow is actually extracting and storing separate filter-sized 'patches' from the image under the hood, then a back-of-the-envelope calculation shows that the intermediate computation involved requires ~130GB in my case, well over the limit that I could test.

正如你自己想的那样，这就是内存消耗大的原因。 Tensorflow 这样做是因为过滤器通常很小，并且计算矩阵乘法比计算卷积快很多。

can anyone explain why TF would do this when I'm still only debugging on a CPU?

您也可以在没有 GPU 的情况下使用 tensorflow，因此 CPU 实现不仅仅用于调试。它们还针对速度进行了优化，矩阵乘法在 CPU 和 GPU 上都更快。

要使大型过滤器的卷积成为可能，您必须在 C++ 中为大型过滤器实现卷积并将其作为新运算添加到 tensorflow。

带有大过滤器的 tensorflow conv2d 的内存使用

Memory usage of tensorflow conv2d with large filters

memory

convolution

tensorflow