如何比较 Faster RCNN 对象检测模型中的训练和测试性能

How to compare training and test performance in a Faster RCNN object detection model

我正在学习使用 PyTorch 针对自定义数据集实现 Faster RCNN 的教程here

这是我的训练循环:

for images, targets in metric_logger.log_every(data_loader, print_freq, header):
    # FOR GPU
    images = list(image.to(device) for image in images)
    targets = [{k: v.to(device) for k, v in t.items()} for t in targets]

    # Train the model
    loss_dict = model(images, targets)

    # reduce losses over all GPUs for logging purposes
    losses = sum(loss for loss in loss_dict.values())
    loss_dict_reduced = reduce_dict(loss_dict)
    losses_reduced = sum(loss for loss in loss_dict_reduced.values())
    loss_value = losses_reduced.item()

指标记录器(定义 here)在训练期间向控制台输出以下内容:

Epoch: [0]  [  0/226]  eta: 0:07:57  lr: 0.000027  loss: 6.5019 (6.5019)  loss_classifier: 0.8038 (0.8038)  loss_box_reg: 0.1398 (0.1398)  loss_objectness: 5.2717 (5.2717)  loss_rpn_box_reg: 0.2866 (0.2866)  time: 2.1142  data: 0.1003  max mem: 3827
Epoch: [0]  [ 30/226]  eta: 0:02:28  lr: 0.000693  loss: 1.3016 (2.4401)  loss_classifier: 0.2914 (0.4067)  loss_box_reg: 0.2294 (0.2191)  loss_objectness: 0.3558 (1.2913)  loss_rpn_box_reg: 0.3749 (0.5230)  time: 0.7128  data: 0.0923  max mem: 4341

一个纪元结束后,我调用一个 evaluate method 输出以下内容:

Test:  [  0/100]  eta: 0:00:25  model_time: 0.0880 (0.0880)  evaluator_time: 0.1400 (0.1400)  time: 0.2510  data: 0.0200  max mem: 4703
Test:  [ 99/100]  eta: 0:00:00  model_time: 0.0790 (0.0786)  evaluator_time: 0.0110 (0.0382)  time: 0.1528  data: 0.0221  max mem: 4703
Test: Total time: 0:00:14 (0.1401 s / it)
Averaged stats: model_time: 0.0790 (0.0786)  evaluator_time: 0.0110 (0.0382)
Accumulating evaluation results...
DONE (t=0.11s).
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.263
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.346
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.304
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.208
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.308
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.013
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.027
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.175
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.311
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.264
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.351
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.086

我对训练和测试期间使用的不同指标感到有点困惑 - 我想绘制训练 + 验证损失(或等效的 IoU 值),这样我就可以可视化训练和测试性能,以及检查如果发生任何过度拟合。

我的问题是,如何比较模型的训练和测试性能?

evaluate()函数here doesn't calculate any loss. And look at how the loss is calculate in train_one_epoch() here,你实际上需要模型处于train模式。并使其像 train_one_epoch() 除了不更新权重,如

@torch.no_grad()
def evaluate_loss(model, data_loader, device):
    model.train()
    metric_logger = utils.MetricLogger(delimiter="  ")
    header = 'Test:'
    for images, targets in metric_logger.log_every(data_loader, 100, header):
        images = list(image.to(device) for image in images)
        targets = [{k: v.to(device) for k, v in t.items()} for t in targets]

        loss_dict = model(images, targets)

        losses = sum(loss for loss in loss_dict.values())

        # reduce losses over all GPUs for logging purposes
        loss_dict_reduced = utils.reduce_dict(loss_dict)
        losses_reduced = sum(loss for loss in loss_dict_reduced.values())

        metric_logger.update(loss=losses_reduced, **loss_dict_reduced)

但由于您需要模型处于 eval 模式才能获得边界框。如果您需要 mAP,您也需要原始代码的循环。