迁移学习仅适用于 trainable 设置为 false
Transfer learning only works with trainable set to false
我有两个这样初始化的模型
vgg19 = keras.applications.vgg19.VGG19(
weights='imagenet',
include_top=False,
input_shape=(img_height, img_width, img_channels))
for layer in vgg19.layers:
layer.trainable = False
model = Sequential(layers=vgg19.layers)
model.add(Flatten())
model.add(Dense(1024, activation='relu'))
model.add(Dense(512, activation='relu'))
model.add(Dense(10, activation='softmax'))
opt = Adam(learning_rate=0.001, beta_1=0.9)
model.compile(
loss='categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'])
和
vgg19_2 = keras.applications.vgg19.VGG19(
weights='imagenet',
include_top=False,
input_shape=(img_height, img_width, img_channels))
model2 = Sequential(layers=vgg19_2.layers)
model2.add(Dense(1024, activation='relu'))
model2.add(Dense(512, activation='relu'))
model2.add(Dense(10, activation='softmax'))
opt = Adam(learning_rate=0.001, beta_1=0.9)
model2.compile(
loss='categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'])
换句话说,唯一的区别是第二个模型没有将 vgg19 层的可训练参数设置为 false。不幸的是,trainable 设置为 true 的模型无法学习数据。
当我使用 model.fit 我得到
Trainable set to false:
Epoch 1/51
2500/2500 [==============================] - 49s 20ms/step - loss: 1.4319 - accuracy: 0.5466 - val_loss: 1.3951 - val_accuracy: 0.5693
Epoch 2/51
2500/2500 [==============================] - 47s 19ms/step - loss: 1.1508 - accuracy: 0.6009 - val_loss: 0.7832 - val_accuracy: 0.6023
Epoch 3/51
2500/2500 [==============================] - 48s 19ms/step - loss: 1.0816 - accuracy: 0.6256 - val_loss: 0.6782 - val_accuracy: 0.6153
Epoch 4/51
2500/2500 [==============================] - 47s 19ms/step - loss: 1.0396 - accuracy: 0.6450 - val_loss: 1.3045 - val_accuracy: 0.6103
该模型在几个时期内训练到大约 65% 的准确率。但是使用 model2 应该能够做出更好的预测(因为有更多可训练的参数)我得到:
Epoch 1/5
2500/2500 [==============================] - 226s 90ms/step - loss: 2.3028 - accuracy: 0.0980 - val_loss: 2.3038 - val_accuracy: 0.1008
Epoch 2/5
2500/2500 [==============================] - 311s 124ms/step - loss: 2.3029 - accuracy: 0.0980 - val_loss: 2.2988 - val_accuracy: 0.1017
Epoch 3/5
2500/2500 [==============================] - 306s 123ms/step - loss: 2.3029 - accuracy: 0.0980 - val_loss: 2.3052 - val_accuracy: 0.0997
Epoch 4/5
2500/2500 [==============================] - 321s 129ms/step - loss: 2.3029 - accuracy: 0.0972 - val_loss: 2.3028 - val_accuracy: 0.0997
Epoch 5/5
2500/2500 [==============================] - 300s 120ms/step - loss: 2.3028 - accuracy: 0.0988 - val_loss: 2.3027 - val_accuracy: 0.1007
然后当我尝试计算数据的权重梯度时,我只得到零。我知道将像 vgg 这样的大型神经网络训练到最佳状态可能需要很长时间,但考虑到最后 3 层的计算梯度在这两种情况下应该非常相似,为什么准确度如此低?再多的训练也没有改善。
试试这个:
- 训练第一个模型,将
trainable
设置为 False
。你不必将它训练到饱和,所以我会从你的 5 个时期开始。
- 返回并为所有
vgg19
参数将 trainable
设置为 True
。然后,per the documentation,您可以重建并重新编译模型以使这些更改生效。
- 继续对重建模型进行训练,该模型现在具有所有可用于调整的参数。
在迁移学习中,完全冻结迁移层以保存它们是很常见的。在训练的早期阶段,您的附加层不知道该做什么。这意味着当它到达传输层时会产生嘈杂的梯度,这将很快 "detune" 它们远离之前调整好的权重。
将它们全部放在一些代码中,它看起来像这样。
# Original code. Transfer VGG and freeze the weights.
vgg19 = keras.applications.vgg19.VGG19(
weights='imagenet',
include_top=False,
input_shape=(img_height, img_width, img_channels))
for layer in vgg19.layers:
layer.trainable = False
model = Sequential(layers=vgg19.layers)
model.add(Flatten())
model.add(Dense(1024, activation='relu'))
model.add(Dense(512, activation='relu'))
model.add(Dense(10, activation='softmax'))
opt = Adam(learning_rate=0.001, beta_1=0.9)
model.compile(
loss='categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'])
model.fit()
# New second stage: unfreeze and continue training.
for layer in vgg19.layers:
layer.trainable = True
full_model = Sequential(layers=model.layers)
full_model.compile(
loss='categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'])
full_model.fit()
您可能想要调整微调阶段的学习率。不是一定要开始,只是要记住。
第三种选择是使用判别式学习率,正如 Jeremy Howard 和 Sebastian Ruder 在 ULMFiT paper. The idea is that, in Transfer Learning, you usually want the later layers to learn faster than the earlier, transferred layers. So you actually set the learning rates to be different for different sets of layers. The fastai library has a PyTorch implementation 中介绍的那样,该方法通过将模型划分为 "layer groups" 并允许每个模型使用不同的参数来工作。
我有两个这样初始化的模型
vgg19 = keras.applications.vgg19.VGG19(
weights='imagenet',
include_top=False,
input_shape=(img_height, img_width, img_channels))
for layer in vgg19.layers:
layer.trainable = False
model = Sequential(layers=vgg19.layers)
model.add(Flatten())
model.add(Dense(1024, activation='relu'))
model.add(Dense(512, activation='relu'))
model.add(Dense(10, activation='softmax'))
opt = Adam(learning_rate=0.001, beta_1=0.9)
model.compile(
loss='categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'])
和
vgg19_2 = keras.applications.vgg19.VGG19(
weights='imagenet',
include_top=False,
input_shape=(img_height, img_width, img_channels))
model2 = Sequential(layers=vgg19_2.layers)
model2.add(Dense(1024, activation='relu'))
model2.add(Dense(512, activation='relu'))
model2.add(Dense(10, activation='softmax'))
opt = Adam(learning_rate=0.001, beta_1=0.9)
model2.compile(
loss='categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'])
换句话说,唯一的区别是第二个模型没有将 vgg19 层的可训练参数设置为 false。不幸的是,trainable 设置为 true 的模型无法学习数据。
当我使用 model.fit 我得到
Trainable set to false:
Epoch 1/51
2500/2500 [==============================] - 49s 20ms/step - loss: 1.4319 - accuracy: 0.5466 - val_loss: 1.3951 - val_accuracy: 0.5693
Epoch 2/51
2500/2500 [==============================] - 47s 19ms/step - loss: 1.1508 - accuracy: 0.6009 - val_loss: 0.7832 - val_accuracy: 0.6023
Epoch 3/51
2500/2500 [==============================] - 48s 19ms/step - loss: 1.0816 - accuracy: 0.6256 - val_loss: 0.6782 - val_accuracy: 0.6153
Epoch 4/51
2500/2500 [==============================] - 47s 19ms/step - loss: 1.0396 - accuracy: 0.6450 - val_loss: 1.3045 - val_accuracy: 0.6103
该模型在几个时期内训练到大约 65% 的准确率。但是使用 model2 应该能够做出更好的预测(因为有更多可训练的参数)我得到:
Epoch 1/5
2500/2500 [==============================] - 226s 90ms/step - loss: 2.3028 - accuracy: 0.0980 - val_loss: 2.3038 - val_accuracy: 0.1008
Epoch 2/5
2500/2500 [==============================] - 311s 124ms/step - loss: 2.3029 - accuracy: 0.0980 - val_loss: 2.2988 - val_accuracy: 0.1017
Epoch 3/5
2500/2500 [==============================] - 306s 123ms/step - loss: 2.3029 - accuracy: 0.0980 - val_loss: 2.3052 - val_accuracy: 0.0997
Epoch 4/5
2500/2500 [==============================] - 321s 129ms/step - loss: 2.3029 - accuracy: 0.0972 - val_loss: 2.3028 - val_accuracy: 0.0997
Epoch 5/5
2500/2500 [==============================] - 300s 120ms/step - loss: 2.3028 - accuracy: 0.0988 - val_loss: 2.3027 - val_accuracy: 0.1007
然后当我尝试计算数据的权重梯度时,我只得到零。我知道将像 vgg 这样的大型神经网络训练到最佳状态可能需要很长时间,但考虑到最后 3 层的计算梯度在这两种情况下应该非常相似,为什么准确度如此低?再多的训练也没有改善。
试试这个:
- 训练第一个模型,将
trainable
设置为False
。你不必将它训练到饱和,所以我会从你的 5 个时期开始。 - 返回并为所有
vgg19
参数将trainable
设置为True
。然后,per the documentation,您可以重建并重新编译模型以使这些更改生效。 - 继续对重建模型进行训练,该模型现在具有所有可用于调整的参数。
在迁移学习中,完全冻结迁移层以保存它们是很常见的。在训练的早期阶段,您的附加层不知道该做什么。这意味着当它到达传输层时会产生嘈杂的梯度,这将很快 "detune" 它们远离之前调整好的权重。
将它们全部放在一些代码中,它看起来像这样。
# Original code. Transfer VGG and freeze the weights.
vgg19 = keras.applications.vgg19.VGG19(
weights='imagenet',
include_top=False,
input_shape=(img_height, img_width, img_channels))
for layer in vgg19.layers:
layer.trainable = False
model = Sequential(layers=vgg19.layers)
model.add(Flatten())
model.add(Dense(1024, activation='relu'))
model.add(Dense(512, activation='relu'))
model.add(Dense(10, activation='softmax'))
opt = Adam(learning_rate=0.001, beta_1=0.9)
model.compile(
loss='categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'])
model.fit()
# New second stage: unfreeze and continue training.
for layer in vgg19.layers:
layer.trainable = True
full_model = Sequential(layers=model.layers)
full_model.compile(
loss='categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'])
full_model.fit()
您可能想要调整微调阶段的学习率。不是一定要开始,只是要记住。
第三种选择是使用判别式学习率,正如 Jeremy Howard 和 Sebastian Ruder 在 ULMFiT paper. The idea is that, in Transfer Learning, you usually want the later layers to learn faster than the earlier, transferred layers. So you actually set the learning rates to be different for different sets of layers. The fastai library has a PyTorch implementation 中介绍的那样,该方法通过将模型划分为 "layer groups" 并允许每个模型使用不同的参数来工作。