覆盖视频分类模型中的词干方法以改变过滤通道

Question

我正在尝试使用 torchvision 的视频 class化模型 (R3D, R(2+1)D, MC18) 但我的数据是单通道（灰度视频），而这些模型使用 3 通道输入，在那种情况下我正在尝试要覆盖词干 class ，有人可以确认我在做什么是否正确吗？

对于 R3D18 和 MC18 `stem=BasicStem`

class BasicStemModified(nn.Sequential):


    def __init__(self):
        super(BasicStemModified, self).__init__(
            nn.Conv3d(1, 45, kernel_size=(7, 7, 1),  #changing filter to 1 channel input
                      stride=(2, 2, 1), padding=(3, 3, 0),
                      bias=False),
            nn.BatchNorm3d(45),
            nn.ReLU(inplace=True),

            nn.Conv3d(45, 64, kernel_size=(1, 1, 3),
                      stride=(1, 1, 1), padding=(0, 0, 1),
                      bias=False),
            nn.BatchNorm3d(64),
            nn.ReLU(inplace=True))


model = torchvision.models.video.mc3_18(pretrained=False)

model.stem = BasicStemModified() #here assigning the modified stem


model.fc = nn.Sequential(
    nn.Dropout(0.3),
    nn.Linear(model.fc.in_features, num_classes)
)


model.to('cuda:0')

对于 R(2+1)D:

#For R(2+1)D model `stem=R2Plus1dStem`

class R2Plus1dStemModified(nn.Sequential):
    """R(2+1)D stem is different than the default one as it uses separated 3D convolution
    """
    def __init__(self):
        super(R2Plus1dStemModified, self).__init__(
            nn.Conv3d(3, 45, kernel_size=(1, 7, 7),   #changing filter to 1 channel input
                      stride=(1, 2, 2), padding=(0, 3, 3),
                      bias=False),
            nn.BatchNorm3d(45),
            nn.ReLU(inplace=True),
            nn.Conv3d(45, 64, kernel_size=(3, 1, 1),
                      stride=(1, 1, 1), padding=(1, 0, 0),
                      bias=False),
            nn.BatchNorm3d(64),
            nn.ReLU(inplace=True))

model = torchvision.models.video.mc3_18(pretrained=False)

model.stem = R2Plus1dStemModified() #here assigning the modified stem

model.fc = nn.Sequential(
    nn.Dropout(0.3),
    nn.Linear(model.fc.in_features, num_classes)
)


model.to('cuda:0')

Answer 1

从 RGB 切换到灰色时，最简单的方法是更改数据而不是模型：
如果您的输入帧只有一个通道（灰色），您可以简单地 expand 单例通道维度来跨越三个通道。这很简单，允许您按原样使用预训练模型。

如果您坚持要修改模型 - 您可以在保留大部分预训练权重的情况下这样做：

model = torchvision.models.video.mc3_18(pretrained=True)  # get the pretrained
# modify only the first conv layer
origc = model.stem[0]  # the orig conv layer
# build a new layer only with one input channel
c1 = torch.nn.Conv3d(1, origc.out_channels, kernel_size=origc.kernel_size, stride=origc.stride, padding=origc.padding, bias=origc.bias)

# this is the nice part - init the new weights using the original ones
with torch.no_grad():
  c1.weight.data = origc.weight.data.sum(dim=1, keepdim=True)

覆盖视频分类模型中的词干方法以改变过滤通道

Overriding stem method in video classification model to change filter channel

deep-learning

resnet

pytorch

对于 R3D18 和 MC18 `stem=BasicStem`

对于 R(2+1)D:

覆盖视频分类模型中的词干方法以改变过滤通道

Overriding stem method in video classification model to change filter channel

deep-learning

resnet

pytorch

对于 R3D18 和 MC18 stem=BasicStem

对于 R(2+1)D:

对于 R3D18 和 MC18 `stem=BasicStem`