MobileNetV3 上的迁移学习达到稳定状态,我无法超越它

Transfer learning on MobileNetV3 reaches plateau and I can't move past it

我正在尝试使用 Tensorflow 2.5.0 在 MobileNetV3-Small 上进行迁移学习以预测狗的品种 (133 类),因为它在 ImageNet 数据集 (1000 类 上获得了合理的准确性]) 我觉得适应我的问题应该没问题。

我尝试了多种训练变体,最近取得了突破,但现在我的训练停滞在大约 60% 的验证准确率,验证损失(训练和验证的准确率和损失曲线如下)略有波动。

我尝试在下面的第 3 个图表中使用 ReduceLROnPlateau,但它无助于改善问题。谁能建议我如何改进培训?

from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import ModelCheckpoint, ReduceLROnPlateau
from tensorflow.keras.layers import GlobalMaxPooling2D, Dense, Dropout, BatchNormalization
from tensorflow.keras.applications import MobileNetV3Large, MobileNetV3Small
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True # needed for working with this dataset


# define generators
train_datagen = ImageDataGenerator(vertical_flip=True, horizontal_flip=True,
                                   rescale=1.0/255, brightness_range=[0.5, 1.5],
                                   zoom_range=[0.5, 1.5], rotation_range=90)
test_datagen = ImageDataGenerator(rescale=1.0/255)

train_gen = train_datagen.flow_from_directory(train_dir, target_size=(224,224),
                                              batch_size=32, class_mode="categorical")
val_gen = test_datagen.flow_from_directory(val_dir, target_size=(224,224),
                                              batch_size=32, class_mode="categorical")
test_gen = test_datagen.flow_from_directory(test_dir, target_size=(224,224),
                                              batch_size=32, class_mode="categorical")

pretrained_model = MobileNetV3Small(input_shape=(224,224,3), classes=133,
                             weights="imagenet", pooling=None, include_top=False)
# set all layers trainable because when I froze most of the layers the model didn't learn so well
for layer in pretrained_model.layers:
    layer.trainable = True
last_output = pretrained_model.layers[-1].output
x = GlobalMaxPooling2D()(last_output)
x = BatchNormalization()(x)
x = Dense(512, activation='relu')(x)
x = Dense(133, activation='softmax')(x)
model = Model(pretrained_model.input, x)

model.compile(optimizer=Adam(learning_rate=1e-5), loss='categorical_crossentropy', metrics=['accuracy'])

# val_acc with min_delta 0.003; val_loss with min_delta 0.01
plateau = ReduceLROnPlateau(monitor="val_loss", mode="min", patience=5,
                            min_lr=1e-8, factor=0.3, min_delta=0.01,
                            verbose=1)
checkpointer = ModelCheckpoint(filepath=savepath, verbose=1, save_best_only=True,
                               monitor="val_accuracy", mode="max",
                               save_weights_only=True)

您的代码看起来不错,但似乎有一个问题 - 您可能将输入重新调整了两次。根据 the docs 对于 MobilenetV3:

The preprocessing logic has been included in the mobilenet_v3 model implementation. Users are no longer required (...) to normalize the input data.

现在,在您的代码中,有: test_datagen = ImageDataGenerator(rescale=1.0/255)
这实质上使第一个模型层重新缩放,已经重新缩放的值。 这同样适用于 train_datagen.

您可以尝试从 traintest 加载程序中删除 rescale 参数,或者设置 rescale=None.

这也可以解释为什么模型在 backbone 冻结的情况下学习不好。