尝试在 GRAYSCALE 图像上使用自定义 backbone 训练 FasterRCNN 时出错
Error when trying to train FasterRCNN with custom backbone on GRAYSCALE images
我按照 https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html#putting-everything-together 教程中的说明在 GRAYSCALE 图像上为 1 class 创建对象检测器。
这是我的代码(请注意,我将 DenseNet 用作 BACKBONE - 我在自己的数据集上预训练的模型):
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
num_classes = 2 # 1 class + background
model = torch.load(os.path.join(patch_classifier_model_dir, "densenet121.pt"))
backbone = model.features
backbone.out_channels = 1024
anchor_generator = AnchorGenerator(sizes=((32, 64, 128, 256, 512),),
aspect_ratios=((0.5, 1.0, 2.0),))
roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=[0],
output_size=7,
sampling_ratio=2)
# put the pieces together inside a FasterRCNN model
model = FasterRCNN(backbone,
num_classes=2,
rpn_anchor_generator=anchor_generator,
box_roi_pool=roi_pooler)
# move model to the right device
model.to(device)
optimizer = torch.optim.SGD(model.parameters(), lr=0.005,
momentum=0.9, weight_decay=0.0005)
# and a learning rate scheduler which decreases the learning rate by
# 10x every 3 epochs
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
step_size=3,
gamma=0.1)
这是我 运行 陷入的错误:
RuntimeError: Given groups=1, weight of size [64, 1, 7, 7], expected input[2, 3, 1344, 800] to have 1 channels, but got 3 channels instead
基于 FasterRCNN 架构,我认为问题出在 transform
组件中,因为它试图标准化最初是灰度的图像,而不是 RGB:
FasterRCNN(
(transform): GeneralizedRCNNTransform(
Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
Resize(min_size=(800,), max_size=1333, mode='bilinear')
)
(backbone): Sequential(
(conv0): Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(norm0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu0): ReLU(inplace=True)
(pool0): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(denseblock1): _DenseBlock(
(denselayer1): _DenseLayer(
(norm1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): ReLU(inplace=True)
(conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): ReLU(inplace=True)
(conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
)
...............
(norm5): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(rpn): RegionProposalNetwork(
(anchor_generator): AnchorGenerator()
(head): RPNHead(
(conv): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(cls_logits): Conv2d(1024, 15, kernel_size=(1, 1), stride=(1, 1))
(bbox_pred): Conv2d(1024, 60, kernel_size=(1, 1), stride=(1, 1))
)
)
(roi_heads): RoIHeads(
(box_roi_pool): MultiScaleRoIAlign()
(box_head): TwoMLPHead(
(fc6): Linear(in_features=50176, out_features=1024, bias=True)
(fc7): Linear(in_features=1024, out_features=1024, bias=True)
)
(box_predictor): FastRCNNPredictor(
(cls_score): Linear(in_features=1024, out_features=2, bias=True)
(bbox_pred): Linear(in_features=1024, out_features=8, bias=True)
)
)
)
我说的对吗?如果是这样,我该如何解决这个问题?是否有处理灰度图像和 FasterRCNN 的标准实践?
提前致谢!真的很感谢!
Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
表示归一化过程应用于输入图像的所有 3 个通道。 0.485
应用于R通道,0.456
应用于G通道,0.406
应用于B通道。标准偏差值也是如此。
第 1 届会议。 backbone 的层需要 1 通道输入,这就是您收到此错误的原因。
您可以通过以下方式解决问题。
Re-define GeneralizedRCNNTransform 并将其附加到您的模型。你可以这样做:
# put the pieces together inside a FasterRCNN model
model = FasterRCNN(backbone, num_classes=2, rpn_anchor_generator=anchor_generator, box_roi_pool=roi_pooler)
# Changes
grcnn = torchvision.models.detection.transform.GeneralizedRCNNTransform(min_size=800, max_size=1333, image_mean=[0.485], image_std=[0.229])
model.transform = grcnn
model.to(device)
我按照 https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html#putting-everything-together 教程中的说明在 GRAYSCALE 图像上为 1 class 创建对象检测器。
这是我的代码(请注意,我将 DenseNet 用作 BACKBONE - 我在自己的数据集上预训练的模型):
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
num_classes = 2 # 1 class + background
model = torch.load(os.path.join(patch_classifier_model_dir, "densenet121.pt"))
backbone = model.features
backbone.out_channels = 1024
anchor_generator = AnchorGenerator(sizes=((32, 64, 128, 256, 512),),
aspect_ratios=((0.5, 1.0, 2.0),))
roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=[0],
output_size=7,
sampling_ratio=2)
# put the pieces together inside a FasterRCNN model
model = FasterRCNN(backbone,
num_classes=2,
rpn_anchor_generator=anchor_generator,
box_roi_pool=roi_pooler)
# move model to the right device
model.to(device)
optimizer = torch.optim.SGD(model.parameters(), lr=0.005,
momentum=0.9, weight_decay=0.0005)
# and a learning rate scheduler which decreases the learning rate by
# 10x every 3 epochs
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
step_size=3,
gamma=0.1)
这是我 运行 陷入的错误:
RuntimeError: Given groups=1, weight of size [64, 1, 7, 7], expected input[2, 3, 1344, 800] to have 1 channels, but got 3 channels instead
基于 FasterRCNN 架构,我认为问题出在 transform
组件中,因为它试图标准化最初是灰度的图像,而不是 RGB:
FasterRCNN(
(transform): GeneralizedRCNNTransform(
Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
Resize(min_size=(800,), max_size=1333, mode='bilinear')
)
(backbone): Sequential(
(conv0): Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(norm0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu0): ReLU(inplace=True)
(pool0): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(denseblock1): _DenseBlock(
(denselayer1): _DenseLayer(
(norm1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): ReLU(inplace=True)
(conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): ReLU(inplace=True)
(conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
)
...............
(norm5): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(rpn): RegionProposalNetwork(
(anchor_generator): AnchorGenerator()
(head): RPNHead(
(conv): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(cls_logits): Conv2d(1024, 15, kernel_size=(1, 1), stride=(1, 1))
(bbox_pred): Conv2d(1024, 60, kernel_size=(1, 1), stride=(1, 1))
)
)
(roi_heads): RoIHeads(
(box_roi_pool): MultiScaleRoIAlign()
(box_head): TwoMLPHead(
(fc6): Linear(in_features=50176, out_features=1024, bias=True)
(fc7): Linear(in_features=1024, out_features=1024, bias=True)
)
(box_predictor): FastRCNNPredictor(
(cls_score): Linear(in_features=1024, out_features=2, bias=True)
(bbox_pred): Linear(in_features=1024, out_features=8, bias=True)
)
)
)
我说的对吗?如果是这样,我该如何解决这个问题?是否有处理灰度图像和 FasterRCNN 的标准实践?
提前致谢!真的很感谢!
Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
表示归一化过程应用于输入图像的所有 3 个通道。 0.485
应用于R通道,0.456
应用于G通道,0.406
应用于B通道。标准偏差值也是如此。
第 1 届会议。 backbone 的层需要 1 通道输入,这就是您收到此错误的原因。
您可以通过以下方式解决问题。
Re-define GeneralizedRCNNTransform 并将其附加到您的模型。你可以这样做:
# put the pieces together inside a FasterRCNN model
model = FasterRCNN(backbone, num_classes=2, rpn_anchor_generator=anchor_generator, box_roi_pool=roi_pooler)
# Changes
grcnn = torchvision.models.detection.transform.GeneralizedRCNNTransform(min_size=800, max_size=1333, image_mean=[0.485], image_std=[0.229])
model.transform = grcnn
model.to(device)