我自己的 FastRCNN 实现在平衡数据上表现不佳

my own implementation of FastRCNN cannot perform well on balanced data

2020.06.09

训练700张图片,每张抽取64个rois做一个mini-batch,当batch-size设置为2时,投350步完成训练,但对于RCNN,每个目标都抽取作为调整为 224*224 的单个图像,将有 64*700=44800 张图像,每张图像都包含比 7*7 池化特征图更多的信息和特征,我想这就是为什么它看起来不合适,尽管 RCNN 可能是在相同的数据上训练得很好。

============================================= =============================

使用全平衡数据,acc降到0.53(训练数据)

[0.5233287 0.4766713] not plane
[0.5281736 0.4718264] not plane
[0.53316545 0.46683457] not plane
[0.5287853 0.4712147] not plane
[0.52475226 0.47524777] not plane
[0.5293444 0.4706556] not plane
[0.52849627 0.47150376] not plane
[0.52786124 0.4721388 ] not plane
[0.52475226 0.47524777] not plane
[0.53224194 0.4677581 ] not plane
[0.5313732 0.4686268] not plane
[0.528143   0.47185704] not plane
[0.5233287 0.4766713] not plane
[0.5233839 0.4766161] not plane
[0.525427   0.47457302] not plane
[0.51949245 0.48050752] not plane
[0.52733606 0.47266394] not plane
[0.5268566  0.47314337] not plane
[0.52158654 0.47841352] not plane
[0.5412768  0.45872322] not plane
[0.5277719  0.47222808] not plane
[0.5223139 0.4776861] not plane
[0.5289101  0.47108996] not plane
[0.5207478  0.47925228] not plane
[0.52475226 0.47524777] not plane
[0.53407675 0.46592325] not plane
[0.53204036 0.4679596 ] not plane
[0.52786124 0.4721388 ] not plane
[0.52574503 0.47425497] not plane
[0.5271339  0.47286615] not plane
[0.5224281 0.4775719] not plane
[0.5233839 0.4766161] not plane
[0.5196227  0.48037735] not plane
[0.52554363 0.47445634] not plane
[0.52554363 0.47445634] not plane
[0.5446083  0.45539168] not plane
[0.53676397 0.46323603] not plane
[0.53944343 0.46055657] not plane
[0.520972 0.479028] not plane
[0.5492453  0.45075467] not plane
[0.52860624 0.47139376] not plane
[0.5273249 0.4726751] not plane
[0.52752113 0.4724789 ] not plane
[0.52902967 0.47097033] not plane
[0.5307333  0.46926668] not plane
[0.5322479  0.46775213] not plane
[0.53944343 0.46055657] not plane
[0.5499064 0.4500937] not plane
[0.5403881 0.4596119] not plane
[0.5203569  0.47964308] not plane
[0.52871954 0.47128052] not plane
[0.53245085 0.46754912] not plane
[0.5324656 0.4675344] not plane
[0.519246   0.48075405] not plane
[0.5299878  0.47001216] not plane
[0.527601   0.47239903] not plane
[0.5228142 0.4771858] not plane
[0.53725046 0.46274957] not plane

我认为这个网络只是猜测而不是学习...

============================================= =============================

2020.06.08

我遵循 GitHub 中许多回购协议中使用的这种结构,但 acc 不会改进:

def build_model():
    pooled_square_size = 7
    num_rois = 32
    roi_input = Input(shape=(num_rois, 4), name="input_2")
    model_cnn = tf.keras.applications.VGG16(
        include_top=True,
        weights='imagenet'
    )
    x = model_cnn.layers[17].output
    x = RoiPoolingConv(pooled_square_size, roi_input.shape[1])([x, roi_input])
    x = TimeDistributed(Flatten())(x)
    x = TimeDistributed(Dense(4096, activation='selu'))(x)
    x = TimeDistributed(Dropout(0.5))(x)
    x = TimeDistributed(Dense(4096, activation='selu'))(x)
    x = TimeDistributed(Dropout(0.5))(x)
    x = TimeDistributed(Dense(2, activation='softmax', kernel_initializer='zero'))(x)
    model_final = Model(inputs=[model_cnn.input, roi_input], outputs=x)
    opt = Adam(lr=0.0001)
    model_final.compile(
        loss=tf.keras.losses.CategoricalCrossentropy(),
        optimizer=opt,
        metrics=["accuracy"]
    )
    model_final.save("TrainedModels" + slash + "FastRCNN.h5")

训练记录:

100/100 [==============================] - ETA: 0s - loss: 0.5556 - accuracy: 0.7681
Epoch 00001: saving model to TrainedModels\FastRCNN.h5
100/100 [==============================] - 41s 412ms/step - loss: 0.5556 - accuracy: 0.7681
Epoch 2/100
100/100 [==============================] - ETA: 0s - loss: 0.5223 - accuracy: 0.7910
Epoch 00002: saving model to TrainedModels\FastRCNN.h5
100/100 [==============================] - 41s 414ms/step - loss: 0.5223 - accuracy: 0.7910
Epoch 3/100
100/100 [==============================] - ETA: 0s - loss: 0.5340 - accuracy: 0.7797
Epoch 00003: saving model to TrainedModels\FastRCNN.h5
100/100 [==============================] - 42s 416ms/step - loss: 0.5340 - accuracy: 0.7797
Epoch 4/100
100/100 [==============================] - ETA: 0s - loss: 0.5309 - accuracy: 0.7825
Epoch 00004: saving model to TrainedModels\FastRCNN.h5
100/100 [==============================] - 43s 427ms/step - loss: 0.5309 - accuracy: 0.7825
Epoch 5/100
100/100 [==============================] - ETA: 0s - loss: 0.5257 - accuracy: 0.7840
Epoch 00005: saving model to TrainedModels\FastRCNN.h5
100/100 [==============================] - 43s 434ms/step - loss: 0.5257 - accuracy: 0.7840
Epoch 6/100
100/100 [==============================] - ETA: 0s - loss: 0.5181 - accuracy: 0.7928
Epoch 00006: saving model to TrainedModels\FastRCNN.h5
100/100 [==============================] - 42s 423ms/step - loss: 0.5181 - accuracy: 0.7928
Epoch 7/100
100/100 [==============================] - ETA: 0s - loss: 0.5483 - accuracy: 0.7712
Epoch 00007: saving model to TrainedModels\FastRCNN.h5
100/100 [==============================] - 42s 418ms/step - loss: 0.5483 - accuracy: 0.7712
Epoch 8/100
100/100 [==============================] - ETA: 0s - loss: 0.5282 - accuracy: 0.7832
Epoch 00008: saving model to TrainedModels\FastRCNN.h5
100/100 [==============================] - 43s 429ms/step - loss: 0.5282 - accuracy: 0.7832
Epoch 9/100
100/100 [==============================] - ETA: 0s - loss: 0.5385 - accuracy: 0.7765
Epoch 00009: saving model to TrainedModels\FastRCNN.h5

参考:

touchylk/cac

touchylk/cac

anandhupvr/fasterRcnn

xautdestiny/Model_Collection

============================================= =============================

我写了一个基于Fast-RCNN的照片飞机检测二分类模型,训练数据集是通过Selective Search生成的,当我使用Negative/Positive比率约为1的数据集时,模型只能在火车数据集上有大约 0.6 的 acc,当我使 N/P 比率更高并更接近其由选择性搜索生成的原始比率时,火车 acc 可以达到 0.9,但在用于预测测试数据集时表现不佳。 在训练期间,训练 acc 在 epoch 完成后总是相同的,当我使用 TensorBoard 时,我看到层的权重在 epoch 之后没有改变: TensorBoard Histogram of Weights

这是我的模型的基本结构,特征提取是VGG16,输出一个28*28的特征图到ROI Pooling层,我尝试将激活从ReLu改为SeLu,但没有成功: Model Structure

这些是(32*14*14*512)ROI Pooling层之前和之后的输入图像及其特征图(28*28*512):

Input Image

Feature Map Before ROI_P

One Typical Feature Map of ROI After ROI_P

Another Typical Feature Map of ROI After ROI_P

我使用这段代码生成了这个模型:

def build_model():
    num_rois = 32
    roi_input = Input(shape=(num_rois, 4), name="input_2")
    model_cnn = tf.keras.applications.VGG16(
        include_top=True,
        weights='imagenet'
    )
    x = model_cnn.layers[13].output
    x = RoiPoolingConv(pooled_square_size, roi_input.shape[1])([x, roi_input])
    for layer in model_cnn.layers[15:]:
        x = TimeDistributed(layer)(x)
    x = TimeDistributed(Dense(512, activation='sigmoid'))(x)
    x = TimeDistributed(Dense(2, activation='softmax'))(x)
    model_final = Model(inputs=[model_cnn.input, roi_input], outputs=x)
    opt = Adam(lr=0.0001)
    model_final.compile(
        loss=tf.keras.losses.BinaryCrossentropy(),
        optimizer=opt,
        metrics=["accuracy"]
    )
    model_final.save("TrainedModels" + slash + "FastRCNN.h5")

完整代码可以在这里看到:Github Repo

我试过添加BatchNormalization,调整LR,或者简单地添加更多层,但是模型一点改进都没有,我热切期待有人能告诉我这个模型的关键缺陷所以我可以进一步测试一下,谢谢!

我高度怀疑这个 VGG16 有什么奇怪的地方:

这是输入图像:

这是它对应的输出特征图

靠,我知道是什么问题了:

在ROI_Pooling.py中:

 def call(self, x, mask=None):
        assert (len(x) == 2)
        # x[0] is image with shape (rows, cols, channels)
        img = x[0]
        # x[1] is roi with shape (num_rois,4) with ordering (x1,y1,x2,y2)
        rois = x[1]

        input_shape = img.shape

        outputs = []

        x1 = rois[:, :, 0]
        y1 = rois[:, :, 1]
        x2 = rois[:, :, 2]
        y2 = rois[:, :, 3]

曾经是:

def call(self, x, mask=None):
        assert (len(x) == 2)

        # x[0] is image with shape (rows, cols, channels)
        img = x[0]

        # x[1] is roi with shape (num_rois,4) with ordering (x,y,w,h)
        rois = x[1]

        input_shape = img.shape

        outputs = []

        for roi_idx in range(self.num_rois):
            x1 = rois[0, roi_idx, 0]
            y1 = rois[0, roi_idx, 1]
            x2 = rois[0, roi_idx, 2]
            y2 = rois[0, roi_idx, 3]

你可以清楚地看到只有第一批roi被用来产生结果。

现在结果有了很大的改善: