具有图像维度的预测时间的 Faster-RCNN Pytorch 问题

Faster-RCNN Pytorch problem at prediction time with image dimensions

我正在根据本教程使用 PyTorch 微调 Faster-RCNN:https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html


# This works well
>>> img, _ = dataset_test[3]
>>> img.shape
torch.Size([3, 1200, 1600])
>>> model.eval()
>>> with torch.no_grad():
    .. preds = model([img.to(device)])


>>> random_idx = torch.randint(high=50, size=(4,))
>>> images = torch.stack([dataset_test[idx][0] for idx in random_idx])
>>> images.shape
torch.Size([4, 3, 1200, 1600])
>>> with torch.no_grad():
    .. preds = model(images.to(device))
RuntimeError                              Traceback (most recent call last)
<ipython-input-101-52caf8fee7a4> in <module>()
      5 model.eval()
      6 with torch.no_grad():
----> 7   prediction =  model(images.to(device))


RuntimeError: The expanded size of the tensor (1600) must match the existing size (1066) at non-singleton dimension 2.  Target sizes: [3, 1200, 1600].  Tensor sizes: [3, 800, 1066]


在输入 3D 张量列表时工作(IMO 这种行为有点奇怪,我不明白为什么它不能使用 4D 张量):

>>> random_idx = torch.randint(high=50, size=(4,))
>>> images = [dataset_test[idx][0].to(device) for idx in random_idx]
>>> images.shape
torch.Size([4, 3, 1200, 1600])
>>> with torch.no_grad():
    .. preds = model(images)

MaskRCNN 在训练模式下期望张量列表为 'input images' 和字典列表为 'target'。这种特殊的设计选择是由于每个图像可以有可变数量的对象,即每个图像的目标张量将具有可变尺寸,因此我们被迫使用列表而不是目标的批量张量。


