迁移学习：模型给出不变的损失结果。不是训练吗？

Question

我正在尝试在 Inception V3 上训练回归模型。输入是大小为 (96,320,3) 的图像。总共有 16k+ 图像，其中 12k+ 用于训练，其余用于验证。我已经冻结了 Inception 中的所有层，但解冻它们也无济于事（已经尝试过）。我已经用下面代码中所示的几层替换了预训练模型的顶部。

X_train = preprocess_input(X_train)
inception = InceptionV3(weights='imagenet', include_top=False, input_shape=(299,299,3))
inception.trainable = False
print(inception.summary())

driving_input = Input(shape=(96,320,3))
resized_input = Lambda(lambda image: tf.image.resize(image,(299,299)))(driving_input)
inp = inception(resized_input)

x = GlobalAveragePooling2D()(inp)

x = Dense(512, activation = 'relu')(x)
x = Dense(256, activation = 'relu')(x)
x = Dropout(0.25)(x)
x = Dense(128, activation = 'relu')(x)
x = Dense(64, activation = 'relu')(x)
x = Dropout(0.25)(x)
result = Dense(1, activation = 'relu')(x)

lr_schedule = ExponentialDecay(initial_learning_rate=0.1, decay_steps=100000, decay_rate=0.95)
optimizer = Adam(learning_rate=lr_schedule)
loss = Huber(delta=0.5, reduction="auto", name="huber_loss")
model = Model(inputs = driving_input, outputs = result)
model.compile(optimizer=optimizer, loss=loss)

checkpoint = ModelCheckpoint(filepath="./ckpts/model.h5", monitor='val_loss', save_best_only=True)
stopper = EarlyStopping(monitor='val_loss', min_delta=0.0003, patience = 10)

batch_size = 32
epochs = 100

model.fit(x=X_train, y=y_train, shuffle=True, validation_split=0.2, epochs=epochs, 
          batch_size=batch_size, verbose=1, callbacks=[checkpoint, stopper])

结果如下：

为什么我的模型没有训练，我该如何解决？

Answer 1

由于你的问题是回归问题，所以最后一层的激活应该是linear而不是relu。而且学习率太高，你应该考虑根据你的整体设置降低它。在这里，我展示了 MNIST 的代码示例。

# data 
(xtrain, train_target), (xtest, test_target) = tf.keras.datasets.mnist.load_data()
# train_x, MNIST is gray scale, so in order to use it in pretrained weights , extending it to 3 axix
x_train = np.expand_dims(xtrain, axis=-1)
x_train = np.repeat(x_train, 3, axis=-1)
x_train = x_train.astype('float32') / 255
# prepare the label for regression model 
ytrain4 = tf.square(tf.cast(train_target, tf.float32))

# base model 
inception = InceptionV3(weights='imagenet', include_top=False, input_shape=(75,75,3))
inception.trainable = False

# inputs layer
driving_input = tf.keras.layers.Input(shape=(28,28,3))
resized_input = tf.keras.layers.Lambda(lambda image: tf.image.resize(image,(75,75)))(driving_input)
inp = inception(resized_input)

# top model 
x = GlobalAveragePooling2D()(inp)
x = Dense(512, activation = 'relu')(x)
x = Dense(256, activation = 'relu')(x)
x = Dropout(0.25)(x)
x = Dense(128, activation = 'relu')(x)
x = Dense(64, activation = 'relu')(x)
x = Dropout(0.25)(x)
result = Dense(1, activation = 'linear')(x)

# hyper-param
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(initial_learning_rate=0.0001, 
                                                             decay_steps=100000, decay_rate=0.95)
optimizer = tf.keras.optimizers.Adam(learning_rate=lr_schedule)
loss = tf.keras.losses.Huber(delta=0.5, reduction="auto", name="huber_loss")

# build models
model = tf.keras.Model(inputs = driving_input, outputs = result)
model.compile(optimizer=optimizer, loss=loss)

# callbacks
checkpoint = tf.keras.callbacks.ModelCheckpoint(filepath="./ckpts/model.h5", monitor='val_loss', save_best_only=True)
stopper = tf.keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0.0003, patience = 10)

batch_size = 32
epochs = 10

# fit 
model.fit(x=x_train, y=ytrain4, shuffle=True, validation_split=0.2, epochs=epochs, 
          batch_size=batch_size, verbose=1, callbacks=[checkpoint, stopper])

输出

1500/1500 [==============================] - 27s 18ms/step - loss: 5.2239 - val_loss: 3.6060
Epoch 2/10
1500/1500 [==============================] - 26s 17ms/step - loss: 3.5634 - val_loss: 2.9022
Epoch 3/10
1500/1500 [==============================] - 26s 17ms/step - loss: 3.0629 - val_loss: 2.5063
Epoch 4/10
1500/1500 [==============================] - 26s 17ms/step - loss: 2.7615 - val_loss: 2.3764
Epoch 5/10
1500/1500 [==============================] - 26s 17ms/step - loss: 2.5371 - val_loss: 2.1303
Epoch 6/10
1500/1500 [==============================] - 26s 17ms/step - loss: 2.3848 - val_loss: 2.1373
Epoch 7/10
1500/1500 [==============================] - 26s 17ms/step - loss: 2.2653 - val_loss: 1.9039
Epoch 8/10
1500/1500 [==============================] - 26s 17ms/step - loss: 2.1581 - val_loss: 1.9087
Epoch 9/10
1500/1500 [==============================] - 26s 17ms/step - loss: 2.0518 - val_loss: 1.7193
Epoch 10/10
1500/1500 [==============================] - 26s 17ms/step - loss: 1.9699 - val_loss: 1.8837

迁移学习：模型给出不变的损失结果。不是训练吗？

Transfer learning: model is giving unchanged loss results. Is it not training?

python

tensorflow

keras

deep-learning

transfer-learning