关于最大池化？

About max-pooling?

最大池化在视觉中很有用，原因有二：

By eliminating non-maximal values, it reduces computation for upper layers.

It provides a form of translation invariance. Imagine cascading a max-pooling layer with a convolutional layer. There are 8 directions in which one can translate the input image by a single pixel. If max-pooling is done over a 2x2 region, 3 out of these 8 possible configurations will produce exactly the same output at the convolutional layer. For max-pooling over a 3x3 window, this jumps to 5/8.

Since it provides additional robustness to position, max-pooling is a “smart” way of reducing the dimensionality of intermediate representations.

看不懂，8 directions是什么意思？

是什么意思

"If max-pooling is done over a 2x2 region, 3 out of these 8 possible configurations will produce exactly the same output at the convolutional layer. For max-pooling over a 3x3 window, this jumps to 5/8."

是什么意思？

There are 8 directions in which one can translate the input image by a single pixel.

他们正在考虑 2 个水平、2 个垂直和 4 个对角线 1 像素移位。总共有 8 个。

If max-pooling is done over a 2x2 region, 3 out of these 8 possible configurations will produce exactly the same output at the convolutional layer. For max-pooling over a 3x3 window, this jumps to 5/8.

假设我们在图像的 2x2 区域中取最大值。图像是预卷积的，尽管对于本解释的目的而言并不重要。

无论最大值在 2x2 区域中的确切位置，都会有 3 个可能的图像 1 像素平移，导致最大值保留在该特定 2x2 区域中。当然，可能会从邻近地区带来更大的价值，但这不是重点。关键是你得到一些平移不变性。

对于 3x3 区域，它变得更加复杂，因为将最大值保持在区域内的 1 像素平移的数量取决于最大值所在区域的确切位置。他们提到的 5 个翻译对应于 3x3 像素块中边缘中间的位置。角落位置会给出 3 个翻译，而中心位置会给出全部 8 个。

关于最大池化？

About max-pooling?

image-processing

deep-learning

conv-neural-network

max-pooling