在 Keras 中微调 VGG-16 慢速训练
Finetuning VGG-16 Slow training in Keras
我正在尝试使用 LFW 数据集微调 VGG 模型的最后两层,我通过删除原始层并在我的案例中添加具有 19 个输出的 softmax 层来更改 softmax 层尺寸,因为有19 classes 我正在尝试训练。
我还想微调最后一个完全连接的层,以便制作 "custom feature extractor"
我正在设置我希望不可训练的图层,如下所示:
for layer in model.layers:
layer.trainable = False
使用 gpu,我每个时期需要 1 小时来训练 19 个 classes,每个 class 至少 40 张图像 class。
由于我没有很多样本,所以这种训练表现有点奇怪。
有人知道为什么会这样吗?
这里是日志:
Image shape: (224, 224, 3)
Number of classes: 19
K.image_dim_ordering: th
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
input_1 (InputLayer) (None, 3, 224, 224) 0
____________________________________________________________________________________________________
conv1_1 (Convolution2D) (None, 64, 224, 224) 1792 input_1[0][0]
____________________________________________________________________________________________________
conv1_2 (Convolution2D) (None, 64, 224, 224) 36928 conv1_1[0][0]
____________________________________________________________________________________________________
pool1 (MaxPooling2D) (None, 64, 112, 112) 0 conv1_2[0][0]
____________________________________________________________________________________________________
conv2_1 (Convolution2D) (None, 128, 112, 112) 73856 pool1[0][0]
____________________________________________________________________________________________________
conv2_2 (Convolution2D) (None, 128, 112, 112) 147584 conv2_1[0][0]
____________________________________________________________________________________________________
pool2 (MaxPooling2D) (None, 128, 56, 56) 0 conv2_2[0][0]
____________________________________________________________________________________________________
conv3_1 (Convolution2D) (None, 256, 56, 56) 295168 pool2[0][0]
____________________________________________________________________________________________________
conv3_2 (Convolution2D) (None, 256, 56, 56) 590080 conv3_1[0][0]
____________________________________________________________________________________________________
conv3_3 (Convolution2D) (None, 256, 56, 56) 590080 conv3_2[0][0]
____________________________________________________________________________________________________
pool3 (MaxPooling2D) (None, 256, 28, 28) 0 conv3_3[0][0]
____________________________________________________________________________________________________
conv4_1 (Convolution2D) (None, 512, 28, 28) 1180160 pool3[0][0]
____________________________________________________________________________________________________
conv4_2 (Convolution2D) (None, 512, 28, 28) 2359808 conv4_1[0][0]
____________________________________________________________________________________________________
conv4_3 (Convolution2D) (None, 512, 28, 28) 2359808 conv4_2[0][0]
____________________________________________________________________________________________________
pool4 (MaxPooling2D) (None, 512, 14, 14) 0 conv4_3[0][0]
____________________________________________________________________________________________________
conv5_1 (Convolution2D) (None, 512, 14, 14) 2359808 pool4[0][0]
____________________________________________________________________________________________________
conv5_2 (Convolution2D) (None, 512, 14, 14) 2359808 conv5_1[0][0]
____________________________________________________________________________________________________
conv5_3 (Convolution2D) (None, 512, 14, 14) 2359808 conv5_2[0][0]
____________________________________________________________________________________________________
pool5 (MaxPooling2D) (None, 512, 7, 7) 0 conv5_3[0][0]
____________________________________________________________________________________________________
flatten (Flatten) (None, 25088) 0 pool5[0][0]
____________________________________________________________________________________________________
fc6 (Dense) (None, 4096) 102764544 flatten[0][0]
____________________________________________________________________________________________________
fc7 (Dense) (None, 4096) 16781312 fc6[0][0]
____________________________________________________________________________________________________
batchnormalization_1 (BatchNorma (None, 4096) 16384 fc7[0][0]
____________________________________________________________________________________________________
fc8 (Dense) (None, 19) 77843 batchnormalization_1[0][0]
====================================================================================================
Total params: 134,354,771
Trainable params: 16,867,347
Non-trainable params: 117,487,424
____________________________________________________________________________________________________
None
Train on 1120 samples, validate on 747 samples
Epoch 1/20
1120/1120 [==============================] - 7354s - loss: 2.9517 - acc: 0.0714 - val_loss: 2.9323 - val_acc: 0.2316
Epoch 2/20
1120/1120 [==============================] - 7356s - loss: 2.8053 - acc: 0.1732 - val_loss: 2.9187 - val_acc: 0.3614
Epoch 3/20
1120/1120 [==============================] - 7358s - loss: 2.6727 - acc: 0.2643 - val_loss: 2.9034 - val_acc: 0.3882
Epoch 4/20
1120/1120 [==============================] - 7361s - loss: 2.5565 - acc: 0.3071 - val_loss: 2.8861 - val_acc: 0.4016
Epoch 5/20
1120/1120 [==============================] - 7360s - loss: 2.4597 - acc: 0.3518 - val_loss: 2.8667 - val_acc: 0.4043
Epoch 6/20
1120/1120 [==============================] - 7363s - loss: 2.3827 - acc: 0.3714 - val_loss: 2.8448 - val_acc: 0.4163
Epoch 7/20
1120/1120 [==============================] - 7364s - loss: 2.3108 - acc: 0.4045 - val_loss: 2.8196 - val_acc: 0.4244
Epoch 8/20
1120/1120 [==============================] - 7377s - loss: 2.2463 - acc: 0.4268 - val_loss: 2.7905 - val_acc: 0.4324
Epoch 9/20
1120/1120 [==============================] - 7373s - loss: 2.1824 - acc: 0.4563 - val_loss: 2.7572 - val_acc: 0.4404
Epoch 10/20
1120/1120 [==============================] - 7373s - loss: 2.1313 - acc: 0.4732 - val_loss: 2.7190 - val_acc: 0.4471
Epoch 11/20
1120/1120 [==============================] - 7440s - loss: 2.0766 - acc: 0.5036 - val_loss: 2.6754 - val_acc: 0.4565
Epoch 12/20
1120/1120 [==============================] - 7414s - loss: 2.0323 - acc: 0.5170 - val_loss: 2.6263 - val_acc: 0.4565
Epoch 13/20
1120/1120 [==============================] - 7413s - loss: 1.9840 - acc: 0.5420 - val_loss: 2.5719 - val_acc: 0.4592
Epoch 14/20
1120/1120 [==============================] - 7414s - loss: 1.9467 - acc: 0.5464 - val_loss: 2.5130 - val_acc: 0.4592
Epoch 15/20
1120/1120 [==============================] - 7412s - loss: 1.9039 - acc: 0.5652 - val_loss: 2.4513 - val_acc: 0.4592
Epoch 16/20
1120/1120 [==============================] - 7413s - loss: 1.8716 - acc: 0.5723 - val_loss: 2.3906 - val_acc: 0.4578
Epoch 17/20
1120/1120 [==============================] - 7415s - loss: 1.8214 - acc: 0.5866 - val_loss: 2.3319 - val_acc: 0.4538
Epoch 18/20
1120/1120 [==============================] - 7416s - loss: 1.7860 - acc: 0.5982 - val_loss: 2.2789 - val_acc: 0.4538
Epoch 19/20
1120/1120 [==============================] - 7430s - loss: 1.7623 - acc: 0.5973 - val_loss: 2.2322 - val_acc: 0.4538
Epoch 20/20
1120/1120 [==============================] - 7856s - loss: 1.7222 - acc: 0.6170 - val_loss: 2.1913 - val_acc: 0.4538
Accuracy: 45.38%
结果不好,因为我无法用更多的数据训练它,因为它花费的时间太长了。有什么想法吗?
请注意,您要提供 ~ 19 * 40 < 800
个示例以便训练 16,867,347
个参数。所以这基本上是每个示例的 2e6
个参数。这根本无法正常工作。尝试删除所有 FCN
层(顶部的 Dense
层)并放置较小的 Dense
,例如每个约 50 个神经元。在我看来,这应该可以帮助您提高准确性并加快训练速度。
我正在尝试使用 LFW 数据集微调 VGG 模型的最后两层,我通过删除原始层并在我的案例中添加具有 19 个输出的 softmax 层来更改 softmax 层尺寸,因为有19 classes 我正在尝试训练。 我还想微调最后一个完全连接的层,以便制作 "custom feature extractor"
我正在设置我希望不可训练的图层,如下所示:
for layer in model.layers:
layer.trainable = False
使用 gpu,我每个时期需要 1 小时来训练 19 个 classes,每个 class 至少 40 张图像 class。
由于我没有很多样本,所以这种训练表现有点奇怪。
有人知道为什么会这样吗?
这里是日志:
Image shape: (224, 224, 3)
Number of classes: 19
K.image_dim_ordering: th
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
input_1 (InputLayer) (None, 3, 224, 224) 0
____________________________________________________________________________________________________
conv1_1 (Convolution2D) (None, 64, 224, 224) 1792 input_1[0][0]
____________________________________________________________________________________________________
conv1_2 (Convolution2D) (None, 64, 224, 224) 36928 conv1_1[0][0]
____________________________________________________________________________________________________
pool1 (MaxPooling2D) (None, 64, 112, 112) 0 conv1_2[0][0]
____________________________________________________________________________________________________
conv2_1 (Convolution2D) (None, 128, 112, 112) 73856 pool1[0][0]
____________________________________________________________________________________________________
conv2_2 (Convolution2D) (None, 128, 112, 112) 147584 conv2_1[0][0]
____________________________________________________________________________________________________
pool2 (MaxPooling2D) (None, 128, 56, 56) 0 conv2_2[0][0]
____________________________________________________________________________________________________
conv3_1 (Convolution2D) (None, 256, 56, 56) 295168 pool2[0][0]
____________________________________________________________________________________________________
conv3_2 (Convolution2D) (None, 256, 56, 56) 590080 conv3_1[0][0]
____________________________________________________________________________________________________
conv3_3 (Convolution2D) (None, 256, 56, 56) 590080 conv3_2[0][0]
____________________________________________________________________________________________________
pool3 (MaxPooling2D) (None, 256, 28, 28) 0 conv3_3[0][0]
____________________________________________________________________________________________________
conv4_1 (Convolution2D) (None, 512, 28, 28) 1180160 pool3[0][0]
____________________________________________________________________________________________________
conv4_2 (Convolution2D) (None, 512, 28, 28) 2359808 conv4_1[0][0]
____________________________________________________________________________________________________
conv4_3 (Convolution2D) (None, 512, 28, 28) 2359808 conv4_2[0][0]
____________________________________________________________________________________________________
pool4 (MaxPooling2D) (None, 512, 14, 14) 0 conv4_3[0][0]
____________________________________________________________________________________________________
conv5_1 (Convolution2D) (None, 512, 14, 14) 2359808 pool4[0][0]
____________________________________________________________________________________________________
conv5_2 (Convolution2D) (None, 512, 14, 14) 2359808 conv5_1[0][0]
____________________________________________________________________________________________________
conv5_3 (Convolution2D) (None, 512, 14, 14) 2359808 conv5_2[0][0]
____________________________________________________________________________________________________
pool5 (MaxPooling2D) (None, 512, 7, 7) 0 conv5_3[0][0]
____________________________________________________________________________________________________
flatten (Flatten) (None, 25088) 0 pool5[0][0]
____________________________________________________________________________________________________
fc6 (Dense) (None, 4096) 102764544 flatten[0][0]
____________________________________________________________________________________________________
fc7 (Dense) (None, 4096) 16781312 fc6[0][0]
____________________________________________________________________________________________________
batchnormalization_1 (BatchNorma (None, 4096) 16384 fc7[0][0]
____________________________________________________________________________________________________
fc8 (Dense) (None, 19) 77843 batchnormalization_1[0][0]
====================================================================================================
Total params: 134,354,771
Trainable params: 16,867,347
Non-trainable params: 117,487,424
____________________________________________________________________________________________________
None
Train on 1120 samples, validate on 747 samples
Epoch 1/20
1120/1120 [==============================] - 7354s - loss: 2.9517 - acc: 0.0714 - val_loss: 2.9323 - val_acc: 0.2316
Epoch 2/20
1120/1120 [==============================] - 7356s - loss: 2.8053 - acc: 0.1732 - val_loss: 2.9187 - val_acc: 0.3614
Epoch 3/20
1120/1120 [==============================] - 7358s - loss: 2.6727 - acc: 0.2643 - val_loss: 2.9034 - val_acc: 0.3882
Epoch 4/20
1120/1120 [==============================] - 7361s - loss: 2.5565 - acc: 0.3071 - val_loss: 2.8861 - val_acc: 0.4016
Epoch 5/20
1120/1120 [==============================] - 7360s - loss: 2.4597 - acc: 0.3518 - val_loss: 2.8667 - val_acc: 0.4043
Epoch 6/20
1120/1120 [==============================] - 7363s - loss: 2.3827 - acc: 0.3714 - val_loss: 2.8448 - val_acc: 0.4163
Epoch 7/20
1120/1120 [==============================] - 7364s - loss: 2.3108 - acc: 0.4045 - val_loss: 2.8196 - val_acc: 0.4244
Epoch 8/20
1120/1120 [==============================] - 7377s - loss: 2.2463 - acc: 0.4268 - val_loss: 2.7905 - val_acc: 0.4324
Epoch 9/20
1120/1120 [==============================] - 7373s - loss: 2.1824 - acc: 0.4563 - val_loss: 2.7572 - val_acc: 0.4404
Epoch 10/20
1120/1120 [==============================] - 7373s - loss: 2.1313 - acc: 0.4732 - val_loss: 2.7190 - val_acc: 0.4471
Epoch 11/20
1120/1120 [==============================] - 7440s - loss: 2.0766 - acc: 0.5036 - val_loss: 2.6754 - val_acc: 0.4565
Epoch 12/20
1120/1120 [==============================] - 7414s - loss: 2.0323 - acc: 0.5170 - val_loss: 2.6263 - val_acc: 0.4565
Epoch 13/20
1120/1120 [==============================] - 7413s - loss: 1.9840 - acc: 0.5420 - val_loss: 2.5719 - val_acc: 0.4592
Epoch 14/20
1120/1120 [==============================] - 7414s - loss: 1.9467 - acc: 0.5464 - val_loss: 2.5130 - val_acc: 0.4592
Epoch 15/20
1120/1120 [==============================] - 7412s - loss: 1.9039 - acc: 0.5652 - val_loss: 2.4513 - val_acc: 0.4592
Epoch 16/20
1120/1120 [==============================] - 7413s - loss: 1.8716 - acc: 0.5723 - val_loss: 2.3906 - val_acc: 0.4578
Epoch 17/20
1120/1120 [==============================] - 7415s - loss: 1.8214 - acc: 0.5866 - val_loss: 2.3319 - val_acc: 0.4538
Epoch 18/20
1120/1120 [==============================] - 7416s - loss: 1.7860 - acc: 0.5982 - val_loss: 2.2789 - val_acc: 0.4538
Epoch 19/20
1120/1120 [==============================] - 7430s - loss: 1.7623 - acc: 0.5973 - val_loss: 2.2322 - val_acc: 0.4538
Epoch 20/20
1120/1120 [==============================] - 7856s - loss: 1.7222 - acc: 0.6170 - val_loss: 2.1913 - val_acc: 0.4538
Accuracy: 45.38%
结果不好,因为我无法用更多的数据训练它,因为它花费的时间太长了。有什么想法吗?
请注意,您要提供 ~ 19 * 40 < 800
个示例以便训练 16,867,347
个参数。所以这基本上是每个示例的 2e6
个参数。这根本无法正常工作。尝试删除所有 FCN
层(顶部的 Dense
层)并放置较小的 Dense
,例如每个约 50 个神经元。在我看来,这应该可以帮助您提高准确性并加快训练速度。