cifar 10 的自动编码器,精度低

autoencoder for cifar 10 with low accuracy

我正在构建一个卷积自动编码器,其中 objective 用于对图像进行编码然后对其进行解码。但是,我总是绕过准确率:61% - 损失:~ 0.0159。以下是我的代码。我没有使用 Batch normalization 或 dropout。我不确定如何提高准确性。

#define the input shape
input_img = Input(shape = (img_width, img_height, img_channels))

# convert to float32 format
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

# Normalize the data
x_train = x_train / 255
x_test = x_test / 255


x = Conv2D(64, (3, 3), activation='relu', padding='same') (input_img)
x = MaxPooling2D((2, 2)) (x)
x = Conv2D(32, (3, 3), activation='relu', padding='same') (x)
x = MaxPooling2D((2, 2)) (x)
x = Conv2D(16, (3, 3), activation='relu', padding='same') (x)
x = MaxPooling2D((2, 2)) (x)
x = Conv2D(8, (3, 3), activation='relu', padding='same') (x)
encoded = MaxPooling2D((2, 2)) (x)

x = Conv2D(8, (3, 3), activation='relu', padding='same') (encoded)
x = UpSampling2D((2, 2)) (x)
x = Conv2D(16, (3, 3), activation='relu', padding='same') (x)
x = UpSampling2D((2, 2)) (x)
x = Conv2D(32, (3, 3), activation='relu', padding='same') (x)
x = UpSampling2D((2, 2)) (x)
x = Conv2D(64, (3, 3), activation='relu', padding='same') (x)
x = UpSampling2D((2, 2)) (x)
decoded = Conv2D(3, (3, 3), padding='same') (x)

cae = Model(input_img,decoded)
cae.compile(optimizer = 'adam', loss ='mse', metrics=['accuracy'] )
cae.summary()

history = cae.fit(x_train,x_train,
       epochs = 25,
       batch_size = 50,
       validation_data = (x_test, x_test))

根据您正在编码的特定图像,期望获得比这更高的准确度可能是不合理的。您执行下采样 (maxpool2D) 4 次,大致将数据位数减少 16 倍。自动编码器本质上是一种压缩算法,其中学习了压缩策略/编码 space。通常,压缩算法只能希望无损地实现 1:3 左右的压缩,因此对自动编码器期望更高可能是不合理的。

话虽如此,您的用例可能针对一组严格受限的图像(例如,静态相机,因此所有图像的背景都相同,等等)。在这种情况下,尽管压缩因子相对较大,但您可能希望获得高精度。我的猜测是 CIFAR 10 的输入有点太大 space,无法以您的压缩级别忠实地重建图像。

可以通过降低压缩比来实现更高的准确度。在我之前的代码中,我消除了一个 MaxPooling2D 和一个 UpSampling2D 然后我的准确率提高到 70%。以下是修改后的代码片段。更高的精度并不意味着更高的性能。由于我只是压缩和解码图像,这完全取决于压缩比和最终 objective.

#define the input shape
input_img = Input(shape = (img_width, img_height, img_channels))

# convert to float32 format
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

# Normalize the data
x_train = x_train / 255
x_test = x_test / 255


x = Conv2D(64, (3, 3), activation='relu', padding='same') (input_img)
x = MaxPooling2D((2, 2)) (x)
x = Conv2D(32, (3, 3), activation='relu', padding='same') (x)
x = MaxPooling2D((2, 2)) (x)
x = Conv2D(16, (3, 3), activation='relu', padding='same') (x)
# removed 
#x = MaxPooling2D((2, 2)) (x)
x = Conv2D(8, (3, 3), activation='relu', padding='same') (x)
encoded = MaxPooling2D((2, 2)) (x)

x = Conv2D(8, (3, 3), activation='relu', padding='same') (encoded)
x = UpSampling2D((2, 2)) (x)
x = Conv2D(16, (3, 3), activation='relu', padding='same') (x)
x = UpSampling2D((2, 2)) (x)
x = Conv2D(32, (3, 3), activation='relu', padding='same') (x)
x = UpSampling2D((2, 2)) (x)
x = Conv2D(64, (3, 3), activation='relu', padding='same') (x)
# removed 
#x = UpSampling2D((2, 2)) (x)
decoded = Conv2D(3, (3, 3), padding='same') (x)

cae = Model(input_img,decoded)
cae.compile(optimizer = 'adam', loss ='mse', metrics=['accuracy'] )
cae.summary()

history = cae.fit(x_train,x_train,
       epochs = 25,
       batch_size = 50,
       validation_data = (x_test, x_test))