我自己的 FastRCNN 实现在平衡数据上表现不佳

my own implementation of FastRCNN cannot perform well on balanced data


训练700张图片,每张抽取64个rois做一个mini-batch,当batch-size设置为2时,投350步完成训练,但对于RCNN,每个目标都抽取作为调整为 224*224 的单个图像,将有 64*700=44800 张图像,每张图像都包含比 7*7 池化特征图更多的信息和特征,我想这就是为什么它看起来不合适,尽管 RCNN 可能是在相同的数据上训练得很好。

[0.5233287 0.4766713] not plane
[0.53725046 0.46274957] not plane


我遵循 GitHub 中许多回购协议中使用的这种结构,但 acc 不会改进:

def build_model():
    pooled_square_size = 7
    num_rois = 32
    roi_input = Input(shape=(num_rois, 4), name="input_2")
    model_cnn = tf.keras.applications.VGG16(
    x = model_cnn.layers[17].output
    x = RoiPoolingConv(pooled_square_size, roi_input.shape[1])([x, roi_input])
    x = TimeDistributed(Flatten())(x)
    x = TimeDistributed(Dense(4096, activation='selu'))(x)
    x = TimeDistributed(Dropout(0.5))(x)
    x = TimeDistributed(Dense(4096, activation='selu'))(x)
    x = TimeDistributed(Dropout(0.5))(x)
    x = TimeDistributed(Dense(2, activation='softmax', kernel_initializer='zero'))(x)
    model_final = Model(inputs=[model_cnn.input, roi_input], outputs=x)
    opt = Adam(lr=0.0001)
    model_final.save("TrainedModels" + slash + "FastRCNN.h5")


100/100 [==============================] - ETA: 0s - loss: 0.5556 - accuracy: 0.7681
Epoch 00001: saving model to TrainedModels\FastRCNN.h5
100/100 [==============================] - 41s 412ms/step - loss: 0.5556 - accuracy: 0.7681
Epoch 2/100
100/100 [==============================] - ETA: 0s - loss: 0.5223 - accuracy: 0.7910
Epoch 00002: saving model to TrainedModels\FastRCNN.h5
100/100 [==============================] - 41s 414ms/step - loss: 0.5223 - accuracy: 0.7910
Epoch 3/100
100/100 [==============================] - ETA: 0s - loss: 0.5340 - accuracy: 0.7797
Epoch 00003: saving model to TrainedModels\FastRCNN.h5
100/100 [==============================] - 42s 416ms/step - loss: 0.5340 - accuracy: 0.7797
Epoch 4/100
100/100 [==============================] - ETA: 0s - loss: 0.5309 - accuracy: 0.7825
Epoch 00004: saving model to TrainedModels\FastRCNN.h5
100/100 [==============================] - 43s 427ms/step - loss: 0.5309 - accuracy: 0.7825
Epoch 5/100
100/100 [==============================] - ETA: 0s - loss: 0.5257 - accuracy: 0.7840
Epoch 00005: saving model to TrainedModels\FastRCNN.h5
100/100 [==============================] - 43s 434ms/step - loss: 0.5257 - accuracy: 0.7840
Epoch 6/100
100/100 [==============================] - ETA: 0s - loss: 0.5181 - accuracy: 0.7928
Epoch 00006: saving model to TrainedModels\FastRCNN.h5
100/100 [==============================] - 42s 423ms/step - loss: 0.5181 - accuracy: 0.7928
Epoch 7/100
100/100 [==============================] - ETA: 0s - loss: 0.5483 - accuracy: 0.7712
Epoch 00007: saving model to TrainedModels\FastRCNN.h5
100/100 [==============================] - 42s 418ms/step - loss: 0.5483 - accuracy: 0.7712
Epoch 8/100
100/100 [==============================] - ETA: 0s - loss: 0.5282 - accuracy: 0.7832
Epoch 00008: saving model to TrainedModels\FastRCNN.h5
100/100 [==============================] - 43s 429ms/step - loss: 0.5282 - accuracy: 0.7832
Epoch 9/100
100/100 [==============================] - ETA: 0s - loss: 0.5385 - accuracy: 0.7765
Epoch 00009: saving model to TrainedModels\FastRCNN.h5






我写了一个基于Fast-RCNN的照片飞机检测二分类模型,训练数据集是通过Selective Search生成的,当我使用Negative/Positive比率约为1的数据集时,模型只能在火车数据集上有大约 0.6 的 acc,当我使 N/P 比率更高并更接近其由选择性搜索生成的原始比率时,火车 acc 可以达到 0.9,但在用于预测测试数据集时表现不佳。 在训练期间,训练 acc 在 epoch 完成后总是相同的,当我使用 TensorBoard 时,我看到层的权重在 epoch 之后没有改变: TensorBoard Histogram of Weights

这是我的模型的基本结构,特征提取是VGG16,输出一个28*28的特征图到ROI Pooling层,我尝试将激活从ReLu改为SeLu,但没有成功: Model Structure

这些是(32*14*14*512)ROI Pooling层之前和之后的输入图像及其特征图(28*28*512):

Input Image

Feature Map Before ROI_P

One Typical Feature Map of ROI After ROI_P

Another Typical Feature Map of ROI After ROI_P


def build_model():
    num_rois = 32
    roi_input = Input(shape=(num_rois, 4), name="input_2")
    model_cnn = tf.keras.applications.VGG16(
    x = model_cnn.layers[13].output
    x = RoiPoolingConv(pooled_square_size, roi_input.shape[1])([x, roi_input])
    for layer in model_cnn.layers[15:]:
        x = TimeDistributed(layer)(x)
    x = TimeDistributed(Dense(512, activation='sigmoid'))(x)
    x = TimeDistributed(Dense(2, activation='softmax'))(x)
    model_final = Model(inputs=[model_cnn.input, roi_input], outputs=x)
    opt = Adam(lr=0.0001)
    model_final.save("TrainedModels" + slash + "FastRCNN.h5")

完整代码可以在这里看到:Github Repo


我高度怀疑这个 VGG16 有什么奇怪的地方:





 def call(self, x, mask=None):
        assert (len(x) == 2)
        # x[0] is image with shape (rows, cols, channels)
        img = x[0]
        # x[1] is roi with shape (num_rois,4) with ordering (x1,y1,x2,y2)
        rois = x[1]

        input_shape = img.shape

        outputs = []

        x1 = rois[:, :, 0]
        y1 = rois[:, :, 1]
        x2 = rois[:, :, 2]
        y2 = rois[:, :, 3]


def call(self, x, mask=None):
        assert (len(x) == 2)

        # x[0] is image with shape (rows, cols, channels)
        img = x[0]

        # x[1] is roi with shape (num_rois,4) with ordering (x,y,w,h)
        rois = x[1]

        input_shape = img.shape

        outputs = []

        for roi_idx in range(self.num_rois):
            x1 = rois[0, roi_idx, 0]
            y1 = rois[0, roi_idx, 1]
            x2 = rois[0, roi_idx, 2]
            y2 = rois[0, roi_idx, 3]

