MaxPooling2D with padding ='same' 之后图像的形状——在卷积自动编码器中逐层计算形状

Shape of image after MaxPooling2D with padding ='same' --calculating layer-by-layer shape in convolution autoencoder

非常简单地说,当我在 Keras 代码中使用 padding = 'same' 时,我的问题涉及图像大小与 maxpool 层之后的输入图像大小不相同。我正在浏览 Keras 博客:Building Autoencoders in Keras。我正在构建卷积自动编码器。自动编码器代码如下:

input_layer = Input(shape=(28, 28, 1))
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_layer)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
# at this point the representation is (4, 4, 8) i.e. 128-dimensional
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)

autoencoder = Model(input_layer, decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

根据 autoencoder.summary(),第一个 Conv2D(16, (3, 3), activation='relu', padding='same')(input_layer) 层之后的图像输出为 28 X 28 X 16,即与输入图像大小相同。这是因为填充是 'same'

In [49]: autoencoder.summary()
(Numbering of layers is given by me and not produced in output)
_________________________________________________________________
  Layer (type)                 Output Shape             Param #   
=================================================================
1.input_1 (InputLayer)         (None, 28, 28, 1)         0         
_________________________________________________________________
2.conv2d_1 (Conv2D)            (None, 28, 28, 16)        160       
_________________________________________________________________
3.max_pooling2d_1 (MaxPooling2 (None, 14, 14, 16)        0         
_________________________________________________________________
4.conv2d_2 (Conv2D)            (None, 14, 14, 8)         1160      
_________________________________________________________________
5.max_pooling2d_2 (MaxPooling2 (None, 7, 7, 8)           0         
_________________________________________________________________
6.conv2d_3 (Conv2D)            (None, 7, 7, 8)           584       
_________________________________________________________________
7.max_pooling2d_3 (MaxPooling2 (None, 4, 4, 8)           0         
_________________________________________________________________
8.conv2d_4 (Conv2D)            (None, 4, 4, 8)           584       
_________________________________________________________________
9.up_sampling2d_1 (UpSampling2 (None, 8, 8, 8)           0         
_________________________________________________________________
10.conv2d_5 (Conv2D)            (None, 8, 8, 8)           584       
_________________________________________________________________
11.up_sampling2d_2 (UpSampling2 (None, 16, 16, 8)         0         
_________________________________________________________________
12.conv2d_6 (Conv2D)            (None, 14, 14, 16)        1168      
_________________________________________________________________
13.up_sampling2d_3 (UpSampling2 (None, 28, 28, 16)        0         
_________________________________________________________________
14.conv2d_7 (Conv2D)            (None, 28, 28, 1)         145       
=================================================================

下一层 (第 3 层)MaxPooling2D((2, 2), padding='same')(x)。 summary() 显示该层的输出图像大小为 14 X 14 X 16。但是该层中的填充也是 'same'。那么为什么输出图像大小不会保持为带有填充零的 28 X 28 X 16?

此外,当输入形状来自其较早层之上时,在 层 12 之后输出形状如何变为 (14 X 14 X 16) 也不清楚是(16 X 16 X 8)。

`

Next layer (layer 3) is, MaxPooling2D((2, 2), padding='same')(x). The summary() shows the output image size of this layer as, 14 X 14 X 16. But padding in this layer is also 'same'. So how come output image-size does not remain as 28 X 28 X 16 with padded zeros?

似乎对填充的作用有误解。填充只处理极端情况(在图像边界旁边做什么)。但是你有 2x2 maxpooling 操作,在 Keras 中默认 stride 等于池化大小,所以 stride=2,图像大小减半。您需要手动指定 stride=1 以避免这种情况。来自 Keras 文档:

pool_size: integer or tuple of 2 integers, factors by which to downscale (vertical, horizontal). (2, 2) will halve the input in both spatial dimension. If only one integer is specified, the same window length will be used for both dimensions.

strides: Integer, tuple of 2 integers, or None. Strides values. If None, it will default to pool_size.

对于第二个问题

Also, it is not clear as to how the output shape has changed to (14 X 14 X 16) after layer 12, when input shape coming from above its earlier layer is (16 X 16 X 8).

第 12 层没有指定 padding=same。