使用 vgg16 为自己的数据集进行边界框预测

using vgg16 for bounding box prediction for own dataset

构建基于 vgg16 的 classifier 之后。我想构建一个边界框来绑定检测到的对象。

我在互联网上发现我可以通过删除最后一个 Maxpool 之后的层并添加一些 fully connected layer

来做到这一点
flatten = vgg16.output
flatten = Flatten()(flatten)
bboxhead = Dense(128,activation="relu")(flatten)
bboxhead = Dense(64,activation="relu")(bboxhead)
bboxhead = Dense(32,activation="relu")(bboxhead)
bboxhead = Dense(4,activation="relu")(bboxhead)
box_model = Model(inputs = vgg16.input,outputs = bboxhead)
box_model.summary()

模型应该是这样的,和我搜索的一样

   Model: "box_model"
    _________________________________________________________________
     Layer (type)                Output Shape              Param #   
    

=================================================================
     input_1 (InputLayer)        [(None, 224, 224, 3)]     0         
                                                                  


 block1_conv1 (Conv2D)       (None, 224, 224, 64)      1792      
                                                                 
 block1_conv2 (Conv2D)       (None, 224, 224, 64)      36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, 112, 112, 64)      0         
                                                                 
 block2_conv1 (Conv2D)       (None, 112, 112, 128)     73856     
                                                                 
 block2_conv2 (Conv2D)       (None, 112, 112, 128)     147584    
                                                                 
 block2_pool (MaxPooling2D)  (None, 56, 56, 128)       0         
                                                                 
 block3_conv1 (Conv2D)       (None, 56, 56, 256)       295168    
                                                                 
 block3_conv2 (Conv2D)       (None, 56, 56, 256)       590080    
                                                                 
 block3_conv3 (Conv2D)       (None, 56, 56, 256)       590080    
                                                                 
 block3_pool (MaxPooling2D)  (None, 28, 28, 256)       0         
                                                                 
 block4_conv1 (Conv2D)       (None, 28, 28, 512)       1180160   
                                                                 
 block4_conv2 (Conv2D)       (None, 28, 28, 512)       2359808   
                                                                 
 block4_conv3 (Conv2D)       (None, 28, 28, 512)       2359808   
                                                                 
 block4_pool (MaxPooling2D)  (None, 14, 14, 512)       0         
                                                                 
 block5_conv1 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_conv2 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_conv3 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_pool (MaxPooling2D)  (None, 7, 7, 512)         0         
                                                                 
 flatten (Flatten)           (None, 25088)             0         
                                                                 
 dense (Dense)               (None, 128)               3211392   
                                                                 
 dense_1 (Dense)             (None, 64)                8256      
                                                                 
 dense_2 (Dense)             (None, 32)                2080      
                                                                 
 dense_3 (Dense)             (None, 4)                 132       
                                                                 
=================================================================
Total params: 17,936,548
Trainable params: 3,221,860
Non-trainable params: 14,714,688
_________________________________________________________________

然后训练模型

from tensorflow.keras.optimizers import Adam

opt = Adam(1e-4)

box_model.compile(loss='mse',optimizer=opt)

steps, val_steps = train_gen.n/batch_size, val_gen.n/batch_size
num_epochs = 30

history = box_model.fit(train_gen,validation_data=val_gen,batch_size=32,epochs=30,verbose=1)

但是我发现最后的Dense层有4个dim,和我的class(5)个数不符。在我将 dim 更改为 5 之后。它有效,但我无法训练任何东西。输出的 5 值数组不合理(全为 0)。

还是我的实现不正确?

简而言之:你的实现没问题,但是你的数据有误。

为了训练新的输出,您需要新的标签。输入不需要改变,但您需要以某种方式获取您尝试检测的边界框的 x、y、高度和宽度。如果数据集没有提供,就需要自己标注了。

如果您想在边界框坐标上进行训练,您的标签需要是边界框坐标。您无法继续使用数据集的 class 标签进行训练。无论您的模型试图在监督学习中学习什么,这就是您需要作为标签提供的内容。