重新训练自定义 VGGFace 模型产生随机结果
Retraining Custom VGGFace Model Yields Random Results
我正在尝试比较使用 VGGFace 权重的微调 VGGFace 模型与完全重新训练的模型。当我使用微调模型时,我得到了不错的准确率分数。然而,当我通过解冻权重重新训练整个模型时,准确度变得接近随机。
我在猜测是不是因为使用了小数据集?我知道 VGGFace 是在数百万个样本上训练的,而我的数据集只有 1400 个样本(对于二进制 class 化问题,每个 class 700 个)。但我只是想确定我是否正确地加入了 VGGFace 模型和我的自定义模型。会不会也是学习率太快的原因?
使用以下代码设置模型。
def Train_VGG_Model(train_layers=False):
print('='*65);K.clear_session()
vggface_model=VGGFace(model='vgg16')
x=vggface_model.get_layer('fc7/relu').output
x=Dense(512,name='custom_fc8')(x)
x=Activation('relu',name='custom_fc8/relu')(x)
x=Dense(64,name='custom_fc9')(x)
x=Activation('relu',name='custom_fc9/relu')(x)
x=Dense(1,name='custom_fc10')(x)
out=Activation('sigmoid',name='custom_fc10/sigmoid')(x)
custom_model=Model(vggface_model.input,out,
name='Custom VGGFace Model')
for layer in custom_model.layers:
if 'custom_' not in layer.name:
layer.trainable=train_layers
elif 'custom_' in layer.name:
layer.trainable=True
print('Layer name:',layer.name,
'... Trainable:',layer.trainable)
print('='*65);opt=optimizers.Adam(lr=1e-5)
custom_model.compile(loss='binary_crossentropy',
metrics=['accuracy'],
optimizer=opt')
custom_model.summary()
return custom_model
callbacks=[EarlyStopping(monitor='val_loss',patience=1,mode='auto')]
model=Train_VGG_Model(train_layers=train_layers)
model.fit(X_train,y_train,batch_size=32,epochs=100,
callbacks=callbacks,validation_data=(X_valid,y_valid))
输出:
Layer name: input_1 ... Trainable: True
Layer name: conv1_1 ... Trainable: True
Layer name: conv1_2 ... Trainable: True
Layer name: pool1 ... Trainable: True
Layer name: conv2_1 ... Trainable: True
Layer name: conv2_2 ... Trainable: True
Layer name: pool2 ... Trainable: True
Layer name: conv3_1 ... Trainable: True
Layer name: conv3_2 ... Trainable: True
Layer name: conv3_3 ... Trainable: True
Layer name: pool3 ... Trainable: True
Layer name: conv4_1 ... Trainable: True
Layer name: conv4_2 ... Trainable: True
Layer name: conv4_3 ... Trainable: True
Layer name: pool4 ... Trainable: True
Layer name: conv5_1 ... Trainable: True
Layer name: conv5_2 ... Trainable: True
Layer name: conv5_3 ... Trainable: True
Layer name: pool5 ... Trainable: True
Layer name: flatten ... Trainable: True
Layer name: fc6 ... Trainable: True
Layer name: fc6/relu ... Trainable: True
Layer name: fc7 ... Trainable: True
Layer name: fc7/relu ... Trainable: True
Layer name: custom_fc8 ... Trainable: True
Layer name: custom_fc8/relu ... Trainable: True
Layer name: custom_fc9 ... Trainable: True
Layer name: custom_fc9/relu ... Trainable: True
Layer name: custom_fc10 ... Trainable: True
Layer name: custom_fc10/sigmoid ... Trainable: True
=================================================================
Model: "Custom VGGFace Model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 224, 224, 3) 0
_________________________________________________________________
conv1_1 (Conv2D) (None, 224, 224, 64) 1792
_________________________________________________________________
conv1_2 (Conv2D) (None, 224, 224, 64) 36928
_________________________________________________________________
pool1 (MaxPooling2D) (None, 112, 112, 64) 0
_________________________________________________________________
conv2_1 (Conv2D) (None, 112, 112, 128) 73856
_________________________________________________________________
conv2_2 (Conv2D) (None, 112, 112, 128) 147584
_________________________________________________________________
pool2 (MaxPooling2D) (None, 56, 56, 128) 0
_________________________________________________________________
conv3_1 (Conv2D) (None, 56, 56, 256) 295168
_________________________________________________________________
conv3_2 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
conv3_3 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
pool3 (MaxPooling2D) (None, 28, 28, 256) 0
_________________________________________________________________
conv4_1 (Conv2D) (None, 28, 28, 512) 1180160
_________________________________________________________________
conv4_2 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
conv4_3 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
pool4 (MaxPooling2D) (None, 14, 14, 512) 0
_________________________________________________________________
conv5_1 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
conv5_2 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
conv5_3 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
pool5 (MaxPooling2D) (None, 7, 7, 512) 0
_________________________________________________________________
flatten (Flatten) (None, 25088) 0
_________________________________________________________________
fc6 (Dense) (None, 4096) 102764544
_________________________________________________________________
fc6/relu (Activation) (None, 4096) 0
_________________________________________________________________
fc7 (Dense) (None, 4096) 16781312
_________________________________________________________________
fc7/relu (Activation) (None, 4096) 0
_________________________________________________________________
custom_fc8 (Dense) (None, 512) 2097664
_________________________________________________________________
custom_fc8/relu (Activation) (None, 512) 0
_________________________________________________________________
custom_fc9 (Dense) (None, 64) 32832
_________________________________________________________________
custom_fc9/relu (Activation) (None, 64) 0
_________________________________________________________________
custom_fc10 (Dense) (None, 1) 65
_________________________________________________________________
custom_fc10/sigmoid (Activat (None, 1) 0
=================================================================
Total params: 136,391,105
Trainable params: 136,391,105
Non-trainable params: 0
_________________________________________________________________
Train on 784 samples, validate on 336 samples
Epoch 1/100
784/784 [==============================] - 235s 300ms/step - loss: 0.7987 - accuracy: 0.5051 - val_loss: 0.6932 - val_accuracy: 0.5149
Epoch 2/100
784/784 [==============================] - 233s 298ms/step - loss: 0.6935 - accuracy: 0.4605 - val_loss: 0.6932 - val_accuracy: 0.4792
Epoch 3/100
784/784 [==============================] - 236s 301ms/step - loss: 0.6932 - accuracy: 0.5089 - val_loss: 0.6932 - val_accuracy: 0.4792
280/280 [==============================] - 12s 45ms/step
提前致谢,如果我的问题没有意义,请原谅。我对此很陌生。
如果您已经有了足够大的数据集训练好的权重,那么最好只微调/训练最后几层并冻结前面的层。
对于任何 conv NN,初始层用作特征提取器,一个好的预训练模型已经为足够好的数据集学习了最好的特征。
一旦您尝试重新训练整个模型,您就会丢掉所有东西。该模型将尝试转向您拥有的新数据集(可能它更小并且没有像原始数据集那样良好的分布)。这会使模型表现不佳。
如果你真的想训练整个模型,你可以尝试的另一件事是对于初始层,select 非常小的学习率(1e-5 到 1e-6),以及最后一层层选择类似(1e-3)的东西。
我正在尝试比较使用 VGGFace 权重的微调 VGGFace 模型与完全重新训练的模型。当我使用微调模型时,我得到了不错的准确率分数。然而,当我通过解冻权重重新训练整个模型时,准确度变得接近随机。
我在猜测是不是因为使用了小数据集?我知道 VGGFace 是在数百万个样本上训练的,而我的数据集只有 1400 个样本(对于二进制 class 化问题,每个 class 700 个)。但我只是想确定我是否正确地加入了 VGGFace 模型和我的自定义模型。会不会也是学习率太快的原因?
使用以下代码设置模型。
def Train_VGG_Model(train_layers=False):
print('='*65);K.clear_session()
vggface_model=VGGFace(model='vgg16')
x=vggface_model.get_layer('fc7/relu').output
x=Dense(512,name='custom_fc8')(x)
x=Activation('relu',name='custom_fc8/relu')(x)
x=Dense(64,name='custom_fc9')(x)
x=Activation('relu',name='custom_fc9/relu')(x)
x=Dense(1,name='custom_fc10')(x)
out=Activation('sigmoid',name='custom_fc10/sigmoid')(x)
custom_model=Model(vggface_model.input,out,
name='Custom VGGFace Model')
for layer in custom_model.layers:
if 'custom_' not in layer.name:
layer.trainable=train_layers
elif 'custom_' in layer.name:
layer.trainable=True
print('Layer name:',layer.name,
'... Trainable:',layer.trainable)
print('='*65);opt=optimizers.Adam(lr=1e-5)
custom_model.compile(loss='binary_crossentropy',
metrics=['accuracy'],
optimizer=opt')
custom_model.summary()
return custom_model
callbacks=[EarlyStopping(monitor='val_loss',patience=1,mode='auto')]
model=Train_VGG_Model(train_layers=train_layers)
model.fit(X_train,y_train,batch_size=32,epochs=100,
callbacks=callbacks,validation_data=(X_valid,y_valid))
输出:
Layer name: input_1 ... Trainable: True
Layer name: conv1_1 ... Trainable: True
Layer name: conv1_2 ... Trainable: True
Layer name: pool1 ... Trainable: True
Layer name: conv2_1 ... Trainable: True
Layer name: conv2_2 ... Trainable: True
Layer name: pool2 ... Trainable: True
Layer name: conv3_1 ... Trainable: True
Layer name: conv3_2 ... Trainable: True
Layer name: conv3_3 ... Trainable: True
Layer name: pool3 ... Trainable: True
Layer name: conv4_1 ... Trainable: True
Layer name: conv4_2 ... Trainable: True
Layer name: conv4_3 ... Trainable: True
Layer name: pool4 ... Trainable: True
Layer name: conv5_1 ... Trainable: True
Layer name: conv5_2 ... Trainable: True
Layer name: conv5_3 ... Trainable: True
Layer name: pool5 ... Trainable: True
Layer name: flatten ... Trainable: True
Layer name: fc6 ... Trainable: True
Layer name: fc6/relu ... Trainable: True
Layer name: fc7 ... Trainable: True
Layer name: fc7/relu ... Trainable: True
Layer name: custom_fc8 ... Trainable: True
Layer name: custom_fc8/relu ... Trainable: True
Layer name: custom_fc9 ... Trainable: True
Layer name: custom_fc9/relu ... Trainable: True
Layer name: custom_fc10 ... Trainable: True
Layer name: custom_fc10/sigmoid ... Trainable: True
=================================================================
Model: "Custom VGGFace Model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 224, 224, 3) 0
_________________________________________________________________
conv1_1 (Conv2D) (None, 224, 224, 64) 1792
_________________________________________________________________
conv1_2 (Conv2D) (None, 224, 224, 64) 36928
_________________________________________________________________
pool1 (MaxPooling2D) (None, 112, 112, 64) 0
_________________________________________________________________
conv2_1 (Conv2D) (None, 112, 112, 128) 73856
_________________________________________________________________
conv2_2 (Conv2D) (None, 112, 112, 128) 147584
_________________________________________________________________
pool2 (MaxPooling2D) (None, 56, 56, 128) 0
_________________________________________________________________
conv3_1 (Conv2D) (None, 56, 56, 256) 295168
_________________________________________________________________
conv3_2 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
conv3_3 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
pool3 (MaxPooling2D) (None, 28, 28, 256) 0
_________________________________________________________________
conv4_1 (Conv2D) (None, 28, 28, 512) 1180160
_________________________________________________________________
conv4_2 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
conv4_3 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
pool4 (MaxPooling2D) (None, 14, 14, 512) 0
_________________________________________________________________
conv5_1 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
conv5_2 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
conv5_3 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
pool5 (MaxPooling2D) (None, 7, 7, 512) 0
_________________________________________________________________
flatten (Flatten) (None, 25088) 0
_________________________________________________________________
fc6 (Dense) (None, 4096) 102764544
_________________________________________________________________
fc6/relu (Activation) (None, 4096) 0
_________________________________________________________________
fc7 (Dense) (None, 4096) 16781312
_________________________________________________________________
fc7/relu (Activation) (None, 4096) 0
_________________________________________________________________
custom_fc8 (Dense) (None, 512) 2097664
_________________________________________________________________
custom_fc8/relu (Activation) (None, 512) 0
_________________________________________________________________
custom_fc9 (Dense) (None, 64) 32832
_________________________________________________________________
custom_fc9/relu (Activation) (None, 64) 0
_________________________________________________________________
custom_fc10 (Dense) (None, 1) 65
_________________________________________________________________
custom_fc10/sigmoid (Activat (None, 1) 0
=================================================================
Total params: 136,391,105
Trainable params: 136,391,105
Non-trainable params: 0
_________________________________________________________________
Train on 784 samples, validate on 336 samples
Epoch 1/100
784/784 [==============================] - 235s 300ms/step - loss: 0.7987 - accuracy: 0.5051 - val_loss: 0.6932 - val_accuracy: 0.5149
Epoch 2/100
784/784 [==============================] - 233s 298ms/step - loss: 0.6935 - accuracy: 0.4605 - val_loss: 0.6932 - val_accuracy: 0.4792
Epoch 3/100
784/784 [==============================] - 236s 301ms/step - loss: 0.6932 - accuracy: 0.5089 - val_loss: 0.6932 - val_accuracy: 0.4792
280/280 [==============================] - 12s 45ms/step
提前致谢,如果我的问题没有意义,请原谅。我对此很陌生。
如果您已经有了足够大的数据集训练好的权重,那么最好只微调/训练最后几层并冻结前面的层。
对于任何 conv NN,初始层用作特征提取器,一个好的预训练模型已经为足够好的数据集学习了最好的特征。
一旦您尝试重新训练整个模型,您就会丢掉所有东西。该模型将尝试转向您拥有的新数据集(可能它更小并且没有像原始数据集那样良好的分布)。这会使模型表现不佳。
如果你真的想训练整个模型,你可以尝试的另一件事是对于初始层,select 非常小的学习率(1e-5 到 1e-6),以及最后一层层选择类似(1e-3)的东西。