具有非方形输入形状的 CNN 自动编码器

CNN autoencoder with non square input shapes

我实现了一个没有方形输入的 CNN 自动编码器。我有点困惑。是否必须为自动编码器输入平方形状?每个 2D 图像的形状都是 800x20。我已经根据形状输入了数据。但是不知何故,在构建模型时形状不匹配。我已经分享了模型的代码和下面的错误消息。需要您的专家建议。谢谢

x = Input(shape=(800, 20,1)) 

# Encoder
conv1_1 = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
pool1 = MaxPooling2D((2, 2), padding='same')(conv1_1)
conv1_2 = Conv2D(8, (3, 3), activation='relu', padding='same')(pool1)
pool2 = MaxPooling2D((2, 2), padding='same')(conv1_2)
conv1_3 = Conv2D(8, (3, 3), activation='relu', padding='same')(pool2)
h = MaxPooling2D((2, 2), padding='same')(conv1_3)

# Decoder
conv2_1 = Conv2D(8, (3, 3), activation='relu', padding='same')(h)
up1 = UpSampling2D((2, 2))(conv2_1)
conv2_2 = Conv2D(8, (3, 3), activation='relu', padding='same')(up1)
up2 = UpSampling2D((2, 2))(conv2_2)
conv2_3 = Conv2D(16, (3, 3), activation='relu', padding='same')(up2)
up3 = UpSampling2D((2, 2))(conv2_3)
r = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(up3)

model = Model(inputs=x, outputs=r)
model.compile(optimizer='adadelta', loss='binary_crossentropy', metrics=['accuracy'])

results = model.fit(x_train, x_train, epochs = 500, batch_size=16,validation_data= (x_test, x_test))

这是错误:

    ValueError: logits and labels must have the same shape ((16, 800, 24, 1) vs (16, 800, 20, 1))

如错误跟踪所示,您在尝试使用此自动编码器时遇到的问题是模型期望输入形状为 (?, 800, 20, 1),但输出形状为 (?, 796, 20, 1) .虽然它将每个图像输入和输出为

(800, 20, 1)(检查模型摘要的输入和输出形状!)

我的建议 -

  1. 我已修复 padding = 'same' 并重新调整形状,使输入张量形状与输出相同。查看我修改的内核大小以获得所需的输出形状。

  2. 最重要的是,对于堆叠式 conv 编码器-解码器架构(例如自动编码器),建议您将空间信息转换为具有后续层的特征 maps/filters/channels。现在您从 8 个过滤器开始,然后在编码器中移动到 4 个。它应该更像 4->8->16。检查此图以供参考。

根据以上建议,我在这里做了修改。

x = Input(shape=(800, 20,1)) 

# Encoder
conv1_1 = Conv2D(4, (3, 3), activation='relu', padding='same')(x)
pool1 = MaxPooling2D((2, 2), padding='same')(conv1_1)
conv1_2 = Conv2D(8, (3, 3), activation='relu', padding='same')(pool1)
pool2 = MaxPooling2D((2, 2), padding='same')(conv1_2)
conv1_3 = Conv2D(16, (3, 3), activation='relu', padding='same')(pool2)
h = MaxPooling2D((2, 1), padding='same')(conv1_3) #<------

# # Decoder

conv2_1 = Conv2D(16, (3, 3), activation='relu', padding='same')(h)
up1 = UpSampling2D((2, 2))(conv2_1)
conv2_2 = Conv2D(8, (3, 3), activation='relu', padding='same')(up1)
up2 = UpSampling2D((2, 2))(conv2_2)
conv2_3 = Conv2D(4, (3, 3), activation='sigmoid', padding='same')(up2) #<--- ADD PADDING HERE
up3 = UpSampling2D((2, 1))(conv2_3)
r = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(up3)

model = Model(inputs=x, outputs=r)
model.compile(optimizer='adadelta', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()
Model: "model_13"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_16 (InputLayer)        [(None, 800, 20, 1)]      0         
_________________________________________________________________
conv2d_101 (Conv2D)          (None, 800, 20, 4)        40        
_________________________________________________________________
max_pooling2d_45 (MaxPooling (None, 400, 10, 4)        0         
_________________________________________________________________
conv2d_102 (Conv2D)          (None, 400, 10, 8)        296       
_________________________________________________________________
max_pooling2d_46 (MaxPooling (None, 200, 5, 8)         0         
_________________________________________________________________
conv2d_103 (Conv2D)          (None, 200, 5, 16)        1168      
_________________________________________________________________
max_pooling2d_47 (MaxPooling (None, 100, 5, 16)        0         
_________________________________________________________________
conv2d_104 (Conv2D)          (None, 100, 5, 16)        2320      
_________________________________________________________________
up_sampling2d_43 (UpSampling (None, 200, 10, 16)       0         
_________________________________________________________________
conv2d_105 (Conv2D)          (None, 200, 10, 8)        1160      
_________________________________________________________________
up_sampling2d_44 (UpSampling (None, 400, 20, 8)        0         
_________________________________________________________________
conv2d_106 (Conv2D)          (None, 400, 20, 4)        292       
_________________________________________________________________
up_sampling2d_45 (UpSampling (None, 800, 20, 4)        0         
_________________________________________________________________
conv2d_107 (Conv2D)          (None, 800, 20, 1)        37        
=================================================================
Total params: 5,313
Trainable params: 5,313
Non-trainable params: 0
_________________________________________________________________

为了证明模型现在可以运行,让我创建一个与您正在查看的相同的随机数据集。

x_train = np.random.random((16,800,20,1))

model.fit(x_train, x_train, epochs=2)
Epoch 1/2
1/1 [==============================] - 0s 180ms/step - loss: 0.7526 - accuracy: 0.0000e+00
Epoch 2/2
1/1 [==============================] - 0s 189ms/step - loss: 0.7526 - accuracy: 0.0000e+00
<tensorflow.python.keras.callbacks.History at 0x7fa85a9896d0>