Tensorflow 跨步论证

Question

我试图理解 tf.nn.avg_pool、tf.nn.max_pool、tf.nn.conv2d 中的 strides 参数。

documentation 反复说

strides: A list of ints that has length >= 4. The stride of the sliding window for each dimension of the input tensor.

我的问题是：

4个以上的整数分别代表什么？
为什么他们必须有 strides[0] = strides[3] = 1 for convnets？
在 this example 中我们看到 tf.reshape(_X,shape=[-1, 28, 28, 1])。为什么是-1？

遗憾的是，文档中使用 -1 重塑的示例不能很好地转化为这种情况。

Answer 1

输入是 4 维的，形式为：[batch_size, image_rows, image_cols, number_of_colors]

一般来说，步幅定义了应用操作之间的重叠。在 conv2d 的情况下，它指定了卷积滤波器的连续应用之间的距离。特定维度中的值 1 表示我们在每个 row/col 应用运算符，值 2 表示每秒，依此类推。

Re 1) 对卷积很重要的值是第 2 和第 3，它们表示沿行和列应用卷积滤波器时的重叠。 [1, 2, 2, 1] 的值表示我们要在每隔一行和两列应用过滤器。

Re 2) 我不知道技术限制（可能是 CuDNN 要求），但通常人们会沿着行或列维度使用步幅。在批量大小上执行它不一定有意义。不确定的最后一个维度。

Re 3) 为其中一个维度设置 -1 意味着 "set the value for the first dimension so that the total number of elements in the tensor is unchanged"。在我们的例子中，-1 将等于 batch_size.

Answer 2

池化和卷积操作在输入张量上滑动 "window"。以 tf.nn.conv2d 为例：如果输入张量有 4 个维度：[batch, height, width, channels]，则卷积在 height, width 维度上的 2D window 上运行。

strides 确定 window 在每个维度上移动了多少。典型使用将第一个（批次）和最后一个（深度）步幅设置为 1。

让我们使用一个非常具体的示例：运行对 32x32 灰度输入图像进行二维卷积。我说灰度是因为输入图像的深度=1，这有助于保持简单。让该图像看起来像这样：

00 01 02 03 04 ...
10 11 12 13 14 ...
20 21 22 23 24 ...
30 31 32 33 34 ...
...

让我们运行对单个示例进行 2x2 卷积 window（批量大小 = 1）。我们将为卷积提供 8 的输出通道深度。

卷积的输入有shape=[1, 32, 32, 1]。

如果用 padding=SAME 指定 strides=[1,1,1,1]，则过滤器的输出将为 [1, 32, 32, 8]。

过滤器将首先创建一个输出：

F(00 01
  10 11)

然后是：

F(01 02
  11 12)

等等。然后会移动到第二行，计算：

F(10, 11
  20, 21)

然后

F(11, 12
  21, 22)

如果您指定 [1, 2, 2, 1] 的步幅，它不会重叠 windows。它将计算：

F(00, 01
  10, 11)

然后

F(02, 03
  12, 13)

池化运算符的步幅操作类似。

问题 2：为什么 strides [1, x, y, 1] for convnets

第一个是批处理：您通常不想跳过批处理中的示例，或者您不应该一开始就包含它们。 :)

最后一个 1 是卷积的深度：出于同样的原因，您通常不想跳过输入。

conv2d 运算符更通用，因此您可以创建沿其他维度滑动 window 的卷积，但这不是卷积网络中的典型用途。典型的用途是在空间上使用它们。

为什么重塑为 -1 -1 是一个占位符，表示 "adjust as necessary to match the size needed for the full tensor." 这是一种使代码独立于输入批量大小的方法，因此您可以更改您的管道，而不必在代码中到处调整批处理大小。

Answer 3

让我们从 stride 在 1-dim 情况下的作用开始。

让我们假设您的 input = [1, 0, 2, 3, 0, 1, 1] 和 kernel = [2, 1, 3] 卷积的结果是 [8, 11, 7, 9, 4]，它是通过在输入上滑动内核，执行逐元素乘法并对所有内容求和来计算的. Like this:

8 = 1 * 2 + 0 * 1 + 2 * 3
11 = 0 * 2 + 2 * 1 + 3 * 3
7 = 2 * 2 + 3 * 1 + 0 * 3
9 = 3 * 2 + 0 * 1 + 1 * 3
4 = 0 * 2 + 1 * 1 + 1 * 3

这里我们滑动一个元素，但是使用任何其他数字都不会阻止您。这个数字是你的步伐。您可以将其视为通过仅取第 s 个结果来对 1-strided 卷积的结果进行下采样。

知道输入大小i、内核大小k、步幅s和填充p 你可以很容易地计算出卷积的输出大小为：

这里|| operator 表示天花板操作。对于池化层 s = 1.

N-dim 外壳。

了解了 1 维情况下的数学运算，一旦您看到每个维都是独立的，n 维情况就很容易了。所以你只需分别滑动每个维度。这是一个example for 2-d。请注意，您不需要在所有维度上都具有相同的步幅。所以对于 N-dim input/kernel 你应该提供 N strides.

所以现在可以轻松回答您的所有问题了：

4个以上的整数分别代表什么？。 conv2d, pool 告诉你，这个列表代表了每个维度之间的步幅。请注意，strides 列表的长度与内核张量的等级相同。
为什么他们必须有 strides[0] = strides3 = 1 for convnets？。第一个维度是批量大小，最后一个是渠道。既不跳过批处理也不跳过通道是没有意义的。所以你把它们设为 1。对于 width/height 你可以跳过一些东西，这就是为什么它们可能不是 1.
tf.reshape(_X,shape=[-1, 28, 28, 1])。为什么是 -1? tf.reshape 已经为您解决了：

If one component of shape is the special value -1, the size of that dimension is computed so that the total size remains constant. In particular, a shape of [-1] flattens into 1-D. At most one component of shape can be -1.

Answer 4

@dga 做了出色的解释工作，我非常感谢它所提供的帮助。同样，我想分享我对 stride 如何在 3D 卷积中工作的发现。

根据 conv3d 上的 TensorFlow documentation，输入的形状必须按以下顺序：

[batch, in_depth, in_height, in_width, in_channels]

让我们用一个例子从最右边到最左边解释一下变量。假设输入形状是 input_shape = [1000,16,112,112,3]

input_shape[4] is the number of colour channels (RGB or whichever format it is extracted in)
input_shape[3] is the width of the image
input_shape[2] is the height of the image
input_shape[1] is the number of frames that have been lumped into 1 complete data
input_shape[0] is the number of lumped frames of images we have.

下面是有关如何使用 stride 的摘要文档。

strides: A list of ints that has length >= 5. 1-D tensor of length 5. The stride of the sliding window for each dimension of input. Must have strides[0] = strides[4] = 1

正如许多作品中所指出的那样，步幅仅表示 window 或内核从最近的元素跳出多少步，无论是数据帧还是像素（顺便解释一下）。

从上面的文档来看，3D 中的步幅看起来像这样 strides = (1,X,Y,Z,1).

文档强调strides[0] = strides[4] = 1。

strides[0]=1 means that we do not want to skip any data in the batch 
strides[4]=1 means that we do not want to skip in the channel

strides[X] 表示我们应该在集总帧中跳过多少次。因此，例如，如果我们有 16 帧，则 X=1 表示使用每一帧。 X=2 表示每隔一帧使用一次，并且一直持续

strides[y] 和 strides[z] 按照的解释，所以我不会重做那部分。

然而在keras中，你只需要指定一个tuple/list的3个整数，指定卷积沿每个空间维度的步幅，其中空间维度为stride[x]、strides[y]和strides [z]. strides[0] 和 strides[4] 已经默认为 1.

希望有人觉得这有用！

Tensorflow 跨步论证

Tensorflow Strides Argument

python

convolution

neural-network

conv-neural-network

tensorflow

让我们从 stride 在 1-dim 情况下的作用开始。

N-dim 外壳。

所以现在可以轻松回答您的所有问题了：