微调猫图像的 Keras 自动编码器

Question

我想在现实照片（而不是简单的 MNIST 数字）上使用自动编码器。我参加了 cats and dog dataset 和用它训练。我的参数是：

我坚持使用灰度和 128x128 像素图像的缩小版本，并在 ImageDataGenerator 中进行一些预处理以进行数据扩充。
我使用大约 2000 张图像或猫和狗的不同数据集进行训练。我可以拿 10000，但它持续时间太长。
我选择了一个带有基本下采样器和上采样器的卷积网络，并调整了参数，最终得到了 8x8x8 = 512 的 bootlebeck（它是 128x128px 原始图像的 1/32）。

这里是 python 代码：

from keras.preprocessing.image import ImageDataGenerator
from keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D
from keras.models import Model
from keras import metrics
from keras.callbacks import EarlyStopping
import os

root_dir = '/opt/data/pets'
epochs = 400 # epochs of training, the more the better
batch_size = 64 # number of images to be yielded from the generator per batch
seed = 4321 # constant seed for constant conditions
# keras image input type definition
img_channel = 1 # 1 for grayscale, 3 for color
 # dimension of input image for network, the bigger the more CPU and RAM is used
img_x, img_y = 128, 128
input_img = Input(shape = (img_x, img_y, img_channel))

# this is the augmentation configuration we use for training
train_datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)

# this is the augmentation configuration we will use for testing
test_datagen = ImageDataGenerator(rescale=1./255)

# this is a generator that will read pictures found in
# subfolders of 'data/train', and indefinitely generate
# batches of augmented image data
train_generator = train_datagen.flow_from_directory(
        root_dir + '/train',  # this is the target directory
        target_size=(img_x, img_y), # all images will be resized
        batch_size=batch_size,
        color_mode='grayscale',
        class_mode='input', # necessarry for autoencoder
        shuffle=False, # important for correct filename for labels
        seed = seed)

# this is a similar generator, for validation data
validation_generator = test_datagen.flow_from_directory(
        root_dir + '/validation',
        target_size=(img_x, img_y),
        batch_size=batch_size,
        color_mode='grayscale',
        class_mode='input',  # necessarry for autoencoder
        shuffle=False,  # important for correct filename for labels
        seed = seed)

# create convolutional autoencoder inspired from https://blog.keras.io/building-autoencoders-in-keras.html
x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)

x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(16, (3, 3), activation='relu',padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu',padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu',padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(img_channel, (3, 3), activation='sigmoid', padding='same')(x) # example from documentaton

autoencoder = Model(input_img, decoded)
autoencoder.summary() # show model data

autoencoder.compile(optimizer='sgd',loss='mean_squared_error',metrics=[metrics.mae, metrics.categorical_accuracy])

# do not run forever but stop if model does not get better
stopper = EarlyStopping(monitor='val_loss', min_delta=0.0001, patience=2, mode='auto', verbose=1)

# do the actual fitting
autoencoder_train = autoencoder.fit_generator(
        train_generator,
        validation_data=validation_generator,
        epochs=epochs,
        shuffle=False,
        callbacks=[stopper])

# create an encoder for debugging purposes later
encoder = Model(input_img, encoded)

# save the modell paramers to a file
autoencoder.save(os.path.basename(__file__) + '_model.hdf')

## PLOTS ####################################
import matplotlib.pyplot as plt
# Plot loss over epochs    
print(autoencoder_train.history.keys())
plt.plot(autoencoder_train.history['loss'])
plt.plot(autoencoder_train.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'])
plt.show()


# Plot original, encoded and predicted image
import numpy as np
images_show_start = 1
images_show_stop = 20
images_show_number = images_show_stop - images_show_start +1

images,_ = train_generator.next()
plt.figure(figsize=(30, 5))
for i in range(images_show_start, images_show_stop):
    # original image
    ax = plt.subplot(3, images_show_number, i +1)
    image = images[i,:,:,0]
    image_reshaped = np.reshape(image, [1, 128, 128, 1])
    plt.imshow(image,cmap='gray')

    # label
    image_label = os.path.dirname(validation_generator.filenames[i])
    plt.title(image_label) # only OK if shuffle=false

    # encoded image
    ax = plt.subplot(3, images_show_number, i + 1+1*images_show_number)
    image_encoded = encoder.predict(image_reshaped)
     # adjust shape if the network parameters are adjusted
    image_encoded_reshaped = np.reshape(image_encoded, [16,32])
    plt.imshow(image_encoded_reshaped,cmap='gray')

    # predicted image
    ax = plt.subplot(3, images_show_number, i + 1+ 2*images_show_number)
    image_pred = autoencoder.predict(image_reshaped)
    image_pred_reshaped = np.reshape(image_pred, [128,128])
    plt.imshow(image_pred_reshaped,cmap='gray')
plt.show()

在网络配置中，您可以看到层。你怎么看？是深还是简单？可以做哪些调整？

随着时间的推移，损失应该减少。

这里每列有三张图片：

原始（缩小）图像，
编码图像和
预测的。

所以，我想知道，为什么编码后的图像在特征上看起来非常相似（除了它们都是猫）并且有很多垂直线。编码后的图像很大，有 8x8x8 像素，我用 16x32 像素绘制，这使得它是原始图像像素的 1/32。解码图像的质量是否足够？它能以某种方式改善吗？我什至可以在 Autoencoder 中制造一个更小的瓶颈吗？如果我尝试更小的瓶颈，损失会停留在 0.06，预测的图像非常糟糕。

Answer 1

您的模型仅包含很少的参数 (~32,000)。这些可能不足以处理数据和深入了解数据生成概率分布。您的卷积总是将图像大小减小 2 倍，但您不会增加过滤器的数量。这意味着，您的卷积不是体积保持而是实际上强烈收缩。这可能太强大了。我会首先尝试增加参数的数量并检查这是否有助于使图像不那么模糊。然后，如果图像实际上通过增加参数数量变得更好（它应该，因为压缩级别现在比以前低），您可以再次减少参数数量（即压缩状态的大小）。这种方式可以帮助您发现代码中的其他问题。

也许您可以看看 keras 中现有的自动编码器实现，它们在不同的数据集（也具有更复杂的数据）中工作，例如 this one 使用 CIFAR10。

编码状态图像中的黑线可能只是来自您绘制数据的方式。由于该层中的数据深度不是 1 而是 8，因此您必须调整它的大小。如果原始立方体的边界值较低（这是有道理的，因为很可能没有那么多重要信息），您将重新排列立方体的 dark/black 表面并将其投影到二维表面上；这可能看起来像重复的黑线。

此外，考虑到网络的损失图，也可能是训练尚未收敛。因此，如果您继续训练，图像质量可能仍会提高。

最后，您应该使用所有可用的训练图像，而不仅仅是一小部分。这（当然）会增加训练所需的时间，但编码器的结果会更好，因为网络将更能抵抗过度拟合，并且很可能能够更好地泛化。

打乱数据也可能会提高训练的性能。

微调猫图像的 Keras 自动编码器

Fine-tuning of Keras autoencoders of cat images

python

autoencoder

keras

keras-2