为什么 TensorFlow 调用 1D 卷积时计算的是 2D 卷积？

Question

在tf.nn.conv1d的文档中，指出

Internally, this op reshapes the input tensors and invokes tf.nn.conv2d. For example, if data_format does not start with "NC", a tensor of shape [batch, in_width, in_channels] is reshaped to [batch, 1, in_width, in_channels], and the filter is reshaped to [1, filter_width, in_channels, out_channels]. The result is then reshaped back to [batch, out_width, out_channels] (where out_width is a function of the stride and padding as in conv2d) and returned to the caller.

我知道这些操作是等效的，但我对这个实现细节的含义有点困惑。

重塑是否会产生一些计算开销？ 3D卷积有自己的实现，为什么1D卷积没有呢？

感谢任何帮助我和其他人理解 TensorFlow 实现细节的解释！

Answer 1

通过研究源代码，我得出结论，这样做可能是为了实现的方便和简约 - 详情如下。

首先，没有"reshaping"，只有扩展、压缩和重新排序dims，开销很小；实际上没有数组元素在内存中移动 - 只有张量对象的索引说明符发生了变化。

其次，所有 conv 最终路由到 tf.nn_ops.convolution_internal，然后路由到 gen_nn_ops.conv2d 或 gen_nn_ops.conv3d； conv1d 在 gen_nn_ops.py 中不存在。请注意，出于某种原因，您不会在 Git 存储库中找到该文件 - 但它应该在您的本地安装中，/python/ops/gen_nn_ops.py.

最后，要获得关于为什么没有专门的 conv1d 实现的真正答案，您需要询问 gen_nn_ops.py 中卷积算法背后的 cuDNN 开发人员；有可能他们没有发现任何性能改进，而 conv2d 的运行速度同样快。从低级的角度来看，这是有道理的，因为沿 M x 1 输入滑动具有 N x 1 元素的内核的矩阵乘法次数与沿 [=25] 输入 N 的矩阵乘法次数相同=] - 同样，唯一的区别在于索引。

不幸的是，开发者决定封装最终的调用，即_pywrap_tensorflow_internal.TFE_Py_FastPathExecute；该模块由一个 .lib 和一个 .pyd 文件组成——基本上，编译的 C (Cython) 代码需要反汇编以进行内省。

TL;DR (1) "reshaping" 的开销很小； (2) 由于 conv2d 一样快，因此可能缺少专门的 conv1d 实现以节省冗余； (3) 我不是 cuDNN 专家，所以如果您需要确定，最好在 cuDNN, or read their SDK Documentation. Alternatively, a dev at TF Github 询问可能会有帮助。多年来我一直没有看到 cuDNN 开发人员对 SO 做出回答，因此在这里发帖可能不是最好的选择。

Dim 重新排序性能演示:

import numpy as np
from time import time

x = np.random.randn(700, 800, 900) # 504,000,000 elements

t0 = time()
for i in range(1000):
    if i % 2 == 0:
        x = x.reshape(700, 900, 800)
    else:
        x = x.reshape(700, 800, 900)
print(time() - t0)

0.0009968280792236328

为什么 TensorFlow 调用 1D 卷积时计算的是 2D 卷积？

Why does TensorFlow calculate 2D convolutions when 1D convolution is called?

convolution

conv-neural-network

tensorflow