3D 卷积自动编码器未返回正确的输出形状
3D convolutional autoencoder is not returning the right output shape
我正在尝试对时空数据使用自动编码器。
我的数据形状是:batches , filters, timesteps, rows, columns
。我在将自动编码器设置为正确形状时遇到问题。
这是我的模型:
input_imag = Input(shape=(3, 81, 4, 4))
x = Conv3D(16, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(input_imag)
x = MaxPooling3D((3, 2, 2), data_format='channels_first', padding='same')(x)
x = Conv3D(8, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(x)
x = MaxPooling3D((3, 2, 2), data_format='channels_first', padding='same')(x)
x = Conv3D(4, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(x)
encoded = MaxPooling3D((3, 2, 2), data_format='channels_first', padding='same', name='encoder')(x)
x = Conv3D(4, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(encoded)
x = UpSampling3D((3, 2, 2), data_format='channels_first')(x)
x = Conv3D(8, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(x)
x = UpSampling3D((3, 2, 2), data_format='channels_first')(x)
x = Conv3D(16, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(x)
x = UpSampling3D((3, 2, 2), data_format='channels_first')(x)
decoded = Conv3D(3, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(x)
autoencoder = Model(input_imag, decoded)
autoencoder.compile(optimizer='adam', loss='mse')
autoencoder.summary()
这是摘要:
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 3, 81, 4, 4)] 0
_________________________________________________________________
conv3d (Conv3D) (None, 16, 81, 4, 4) 2176
_________________________________________________________________
max_pooling3d (MaxPooling3D) (None, 16, 27, 2, 2) 0
_________________________________________________________________
conv3d_1 (Conv3D) (None, 8, 27, 2, 2) 5768
_________________________________________________________________
max_pooling3d_1 (MaxPooling3 (None, 8, 9, 1, 1) 0
_________________________________________________________________
conv3d_2 (Conv3D) (None, 4, 9, 1, 1) 1444
_________________________________________________________________
encoder (MaxPooling3D) (None, 4, 3, 1, 1) 0
_________________________________________________________________
conv3d_3 (Conv3D) (None, 4, 3, 1, 1) 724
_________________________________________________________________
up_sampling3d (UpSampling3D) (None, 4, 9, 2, 2) 0
_________________________________________________________________
conv3d_4 (Conv3D) (None, 8, 9, 2, 2) 1448
_________________________________________________________________
up_sampling3d_1 (UpSampling3 (None, 8, 27, 4, 4) 0
_________________________________________________________________
conv3d_5 (Conv3D) (None, 16, 27, 4, 4) 5776
_________________________________________________________________
up_sampling3d_2 (UpSampling3 (None, 16, 81, 8, 8) 0
_________________________________________________________________
conv3d_6 (Conv3D) (None, 3, 81, 8, 8) 2163
=================================================================
Total params: 19,499
Trainable params: 19,499
Non-trainable params: 0
我应该更改什么以使解码器输出形状为 [?,3,81,4,4]
而不是 [?,3,81,8,8]
?
您似乎希望 MaxPooling3D 和 UpSampling3D 操作对称(至少在输出形状方面)。再来看最后一个MaxPooling3D层的输入形状:
conv3d_2 (Conv3D) (None, 4, 9, 1, 1) 1444
_________________________________________________________________
encoder (MaxPooling3D) (None, 4, 3, 1, 1) 0
形状是(None, 4, 9, 1, 1)
。最后两个维度已经是 1,所以它们不能除以 2,如 pool_size
中指定的那样。所以 MaxPooling3D 层,尽管有 pool_size=(3, 2, 2)
,但有效地执行了 pool_size=(3, 1, 1)
的操作。至少我认为这是幕后发生的事情。
我有点惊讶在指定 pool_size 大于输入大小时没有错误或警告。
要解决这个问题,您可以将第一个 UpSampling3D 图层的形状设置为 (3, 1, 1)
x = UpSampling3D((3, 1, 1), data_format='channels_first')(x)
所以,完整的解决方案:
input_imag = Input(shape=(3, 81, 4, 4))
x = Conv3D(16, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(input_imag)
x = MaxPooling3D((3, 2, 2), data_format='channels_first', padding='same')(x)
x = Conv3D(8, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(x)
x = MaxPooling3D((3, 2, 2), data_format='channels_first', padding='same')(x)
x = Conv3D(4, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(x)
encoded = MaxPooling3D((3, 2, 2), data_format='channels_first', padding='same', name='encoder')(x)
x = Conv3D(4, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(encoded)
x = UpSampling3D((3, 1, 1), data_format='channels_first')(x)
x = Conv3D(8, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(x)
x = UpSampling3D((3, 2, 2), data_format='channels_first')(x)
x = Conv3D(16, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(x)
x = UpSampling3D((3, 2, 2), data_format='channels_first')(x)
decoded = Conv3D(3, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(x)
autoencoder = Model(input_imag, decoded)
autoencoder.compile(optimizer='adam', loss='mse')
autoencoder.summary()
输出:
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) [(None, 3, 81, 4, 4)] 0
conv3d_14 (Conv3D) (None, 16, 81, 4, 4) 2176
max_pooling3d_4 (MaxPooling (None, 16, 27, 2, 2) 0
3D)
conv3d_15 (Conv3D) (None, 8, 27, 2, 2) 5768
max_pooling3d_5 (MaxPooling (None, 8, 9, 1, 1) 0
3D)
conv3d_16 (Conv3D) (None, 4, 9, 1, 1) 1444
encoder (MaxPooling3D) (None, 4, 3, 1, 1) 0
conv3d_17 (Conv3D) (None, 4, 3, 1, 1) 724
up_sampling3d_6 (UpSampling (None, 4, 9, 1, 1) 0
3D)
conv3d_18 (Conv3D) (None, 8, 9, 1, 1) 1448
up_sampling3d_7 (UpSampling (None, 8, 27, 2, 2) 0
3D)
conv3d_19 (Conv3D) (None, 16, 27, 2, 2) 5776
up_sampling3d_8 (UpSampling (None, 16, 81, 4, 4) 0
3D)
conv3d_20 (Conv3D) (None, 3, 81, 4, 4) 2163
=================================================================
Total params: 19,499
Trainable params: 19,499
Non-trainable params: 0
我正在尝试对时空数据使用自动编码器。
我的数据形状是:batches , filters, timesteps, rows, columns
。我在将自动编码器设置为正确形状时遇到问题。
这是我的模型:
input_imag = Input(shape=(3, 81, 4, 4))
x = Conv3D(16, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(input_imag)
x = MaxPooling3D((3, 2, 2), data_format='channels_first', padding='same')(x)
x = Conv3D(8, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(x)
x = MaxPooling3D((3, 2, 2), data_format='channels_first', padding='same')(x)
x = Conv3D(4, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(x)
encoded = MaxPooling3D((3, 2, 2), data_format='channels_first', padding='same', name='encoder')(x)
x = Conv3D(4, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(encoded)
x = UpSampling3D((3, 2, 2), data_format='channels_first')(x)
x = Conv3D(8, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(x)
x = UpSampling3D((3, 2, 2), data_format='channels_first')(x)
x = Conv3D(16, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(x)
x = UpSampling3D((3, 2, 2), data_format='channels_first')(x)
decoded = Conv3D(3, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(x)
autoencoder = Model(input_imag, decoded)
autoencoder.compile(optimizer='adam', loss='mse')
autoencoder.summary()
这是摘要:
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 3, 81, 4, 4)] 0
_________________________________________________________________
conv3d (Conv3D) (None, 16, 81, 4, 4) 2176
_________________________________________________________________
max_pooling3d (MaxPooling3D) (None, 16, 27, 2, 2) 0
_________________________________________________________________
conv3d_1 (Conv3D) (None, 8, 27, 2, 2) 5768
_________________________________________________________________
max_pooling3d_1 (MaxPooling3 (None, 8, 9, 1, 1) 0
_________________________________________________________________
conv3d_2 (Conv3D) (None, 4, 9, 1, 1) 1444
_________________________________________________________________
encoder (MaxPooling3D) (None, 4, 3, 1, 1) 0
_________________________________________________________________
conv3d_3 (Conv3D) (None, 4, 3, 1, 1) 724
_________________________________________________________________
up_sampling3d (UpSampling3D) (None, 4, 9, 2, 2) 0
_________________________________________________________________
conv3d_4 (Conv3D) (None, 8, 9, 2, 2) 1448
_________________________________________________________________
up_sampling3d_1 (UpSampling3 (None, 8, 27, 4, 4) 0
_________________________________________________________________
conv3d_5 (Conv3D) (None, 16, 27, 4, 4) 5776
_________________________________________________________________
up_sampling3d_2 (UpSampling3 (None, 16, 81, 8, 8) 0
_________________________________________________________________
conv3d_6 (Conv3D) (None, 3, 81, 8, 8) 2163
=================================================================
Total params: 19,499
Trainable params: 19,499
Non-trainable params: 0
我应该更改什么以使解码器输出形状为 [?,3,81,4,4]
而不是 [?,3,81,8,8]
?
您似乎希望 MaxPooling3D 和 UpSampling3D 操作对称(至少在输出形状方面)。再来看最后一个MaxPooling3D层的输入形状:
conv3d_2 (Conv3D) (None, 4, 9, 1, 1) 1444
_________________________________________________________________
encoder (MaxPooling3D) (None, 4, 3, 1, 1) 0
形状是(None, 4, 9, 1, 1)
。最后两个维度已经是 1,所以它们不能除以 2,如 pool_size
中指定的那样。所以 MaxPooling3D 层,尽管有 pool_size=(3, 2, 2)
,但有效地执行了 pool_size=(3, 1, 1)
的操作。至少我认为这是幕后发生的事情。
我有点惊讶在指定 pool_size 大于输入大小时没有错误或警告。
要解决这个问题,您可以将第一个 UpSampling3D 图层的形状设置为 (3, 1, 1)
x = UpSampling3D((3, 1, 1), data_format='channels_first')(x)
所以,完整的解决方案:
input_imag = Input(shape=(3, 81, 4, 4))
x = Conv3D(16, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(input_imag)
x = MaxPooling3D((3, 2, 2), data_format='channels_first', padding='same')(x)
x = Conv3D(8, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(x)
x = MaxPooling3D((3, 2, 2), data_format='channels_first', padding='same')(x)
x = Conv3D(4, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(x)
encoded = MaxPooling3D((3, 2, 2), data_format='channels_first', padding='same', name='encoder')(x)
x = Conv3D(4, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(encoded)
x = UpSampling3D((3, 1, 1), data_format='channels_first')(x)
x = Conv3D(8, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(x)
x = UpSampling3D((3, 2, 2), data_format='channels_first')(x)
x = Conv3D(16, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(x)
x = UpSampling3D((3, 2, 2), data_format='channels_first')(x)
decoded = Conv3D(3, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(x)
autoencoder = Model(input_imag, decoded)
autoencoder.compile(optimizer='adam', loss='mse')
autoencoder.summary()
输出:
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) [(None, 3, 81, 4, 4)] 0
conv3d_14 (Conv3D) (None, 16, 81, 4, 4) 2176
max_pooling3d_4 (MaxPooling (None, 16, 27, 2, 2) 0
3D)
conv3d_15 (Conv3D) (None, 8, 27, 2, 2) 5768
max_pooling3d_5 (MaxPooling (None, 8, 9, 1, 1) 0
3D)
conv3d_16 (Conv3D) (None, 4, 9, 1, 1) 1444
encoder (MaxPooling3D) (None, 4, 3, 1, 1) 0
conv3d_17 (Conv3D) (None, 4, 3, 1, 1) 724
up_sampling3d_6 (UpSampling (None, 4, 9, 1, 1) 0
3D)
conv3d_18 (Conv3D) (None, 8, 9, 1, 1) 1448
up_sampling3d_7 (UpSampling (None, 8, 27, 2, 2) 0
3D)
conv3d_19 (Conv3D) (None, 16, 27, 2, 2) 5776
up_sampling3d_8 (UpSampling (None, 16, 81, 4, 4) 0
3D)
conv3d_20 (Conv3D) (None, 3, 81, 4, 4) 2163
=================================================================
Total params: 19,499
Trainable params: 19,499
Non-trainable params: 0