关于最大池化?
About max-pooling?
最大池化在视觉中很有用,原因有二:
By eliminating non-maximal values, it reduces computation for upper
layers.
It provides a form of translation invariance. Imagine cascading a
max-pooling layer with a convolutional layer. There are 8 directions
in which one can translate the input image by a single pixel. If
max-pooling is done over a 2x2 region, 3 out of these 8 possible
configurations will produce exactly the same output at the
convolutional layer. For max-pooling over a 3x3 window, this jumps to
5/8.
Since it provides additional robustness to position, max-pooling is a
“smart” way of reducing the dimensionality of intermediate
representations.
看不懂,8 directions
是什么意思?
是什么意思
"If max-pooling is done over a 2x2 region, 3 out of these 8 possible
configurations will produce exactly the same output at the
convolutional layer. For max-pooling over a 3x3 window, this jumps to
5/8."
是什么意思?
There are 8 directions in which one can translate the input image by a single pixel.
他们正在考虑 2 个水平、2 个垂直和 4 个对角线 1 像素移位。总共有 8 个。
If max-pooling is done over a 2x2 region, 3 out of these 8 possible configurations will produce exactly the same output at the convolutional layer. For max-pooling over a 3x3 window, this jumps to 5/8.
假设我们在图像的 2x2 区域中取最大值。图像是预卷积的,尽管对于本解释的目的而言并不重要。
无论最大值在 2x2 区域中的确切位置,都会有 3 个可能的图像 1 像素平移,导致最大值保留在该特定 2x2 区域中。当然,可能会从邻近地区带来更大的价值,但这不是重点。关键是你得到 一些 平移不变性。
对于 3x3 区域,它变得更加复杂,因为将最大值保持在区域内的 1 像素平移的数量取决于最大值所在区域的确切位置。他们提到的 5 个翻译对应于 3x3 像素块中边缘中间的位置。角落位置会给出 3 个翻译,而中心位置会给出全部 8 个。
最大池化在视觉中很有用,原因有二:
By eliminating non-maximal values, it reduces computation for upper layers.
It provides a form of translation invariance. Imagine cascading a max-pooling layer with a convolutional layer. There are 8 directions in which one can translate the input image by a single pixel. If max-pooling is done over a 2x2 region, 3 out of these 8 possible configurations will produce exactly the same output at the convolutional layer. For max-pooling over a 3x3 window, this jumps to 5/8.
Since it provides additional robustness to position, max-pooling is a “smart” way of reducing the dimensionality of intermediate representations.
看不懂,8 directions
是什么意思?
"If max-pooling is done over a 2x2 region, 3 out of these 8 possible configurations will produce exactly the same output at the convolutional layer. For max-pooling over a 3x3 window, this jumps to 5/8."
是什么意思?
There are 8 directions in which one can translate the input image by a single pixel.
他们正在考虑 2 个水平、2 个垂直和 4 个对角线 1 像素移位。总共有 8 个。
If max-pooling is done over a 2x2 region, 3 out of these 8 possible configurations will produce exactly the same output at the convolutional layer. For max-pooling over a 3x3 window, this jumps to 5/8.
假设我们在图像的 2x2 区域中取最大值。图像是预卷积的,尽管对于本解释的目的而言并不重要。
无论最大值在 2x2 区域中的确切位置,都会有 3 个可能的图像 1 像素平移,导致最大值保留在该特定 2x2 区域中。当然,可能会从邻近地区带来更大的价值,但这不是重点。关键是你得到 一些 平移不变性。
对于 3x3 区域,它变得更加复杂,因为将最大值保持在区域内的 1 像素平移的数量取决于最大值所在区域的确切位置。他们提到的 5 个翻译对应于 3x3 像素块中边缘中间的位置。角落位置会给出 3 个翻译,而中心位置会给出全部 8 个。