覆盖视频分类模型中的词干方法以改变过滤通道
Overriding stem method in video classification model to change filter channel
我正在尝试使用 torchvision 的视频 class化模型 (R3D, R(2+1)D, MC18) 但我的数据是单通道(灰度视频),而这些模型使用 3 通道输入,在那种情况下我正在尝试要覆盖词干 class ,有人可以确认我在做什么是否正确吗?
对于 R3D18 和 MC18 stem=BasicStem
class BasicStemModified(nn.Sequential):
def __init__(self):
super(BasicStemModified, self).__init__(
nn.Conv3d(1, 45, kernel_size=(7, 7, 1), #changing filter to 1 channel input
stride=(2, 2, 1), padding=(3, 3, 0),
bias=False),
nn.BatchNorm3d(45),
nn.ReLU(inplace=True),
nn.Conv3d(45, 64, kernel_size=(1, 1, 3),
stride=(1, 1, 1), padding=(0, 0, 1),
bias=False),
nn.BatchNorm3d(64),
nn.ReLU(inplace=True))
model = torchvision.models.video.mc3_18(pretrained=False)
model.stem = BasicStemModified() #here assigning the modified stem
model.fc = nn.Sequential(
nn.Dropout(0.3),
nn.Linear(model.fc.in_features, num_classes)
)
model.to('cuda:0')
对于 R(2+1)D:
#For R(2+1)D model `stem=R2Plus1dStem`
class R2Plus1dStemModified(nn.Sequential):
"""R(2+1)D stem is different than the default one as it uses separated 3D convolution
"""
def __init__(self):
super(R2Plus1dStemModified, self).__init__(
nn.Conv3d(3, 45, kernel_size=(1, 7, 7), #changing filter to 1 channel input
stride=(1, 2, 2), padding=(0, 3, 3),
bias=False),
nn.BatchNorm3d(45),
nn.ReLU(inplace=True),
nn.Conv3d(45, 64, kernel_size=(3, 1, 1),
stride=(1, 1, 1), padding=(1, 0, 0),
bias=False),
nn.BatchNorm3d(64),
nn.ReLU(inplace=True))
model = torchvision.models.video.mc3_18(pretrained=False)
model.stem = R2Plus1dStemModified() #here assigning the modified stem
model.fc = nn.Sequential(
nn.Dropout(0.3),
nn.Linear(model.fc.in_features, num_classes)
)
model.to('cuda:0')
从 RGB 切换到灰色时,最简单的方法是更改数据而不是模型:
如果您的输入帧只有一个通道(灰色),您可以简单地 expand
单例通道维度来跨越三个通道。这很简单,允许您按原样使用预训练模型。
如果您坚持要修改模型 - 您可以在保留大部分预训练权重的情况下这样做:
model = torchvision.models.video.mc3_18(pretrained=True) # get the pretrained
# modify only the first conv layer
origc = model.stem[0] # the orig conv layer
# build a new layer only with one input channel
c1 = torch.nn.Conv3d(1, origc.out_channels, kernel_size=origc.kernel_size, stride=origc.stride, padding=origc.padding, bias=origc.bias)
# this is the nice part - init the new weights using the original ones
with torch.no_grad():
c1.weight.data = origc.weight.data.sum(dim=1, keepdim=True)
我正在尝试使用 torchvision 的视频 class化模型 (R3D, R(2+1)D, MC18) 但我的数据是单通道(灰度视频),而这些模型使用 3 通道输入,在那种情况下我正在尝试要覆盖词干 class ,有人可以确认我在做什么是否正确吗?
对于 R3D18 和 MC18 stem=BasicStem
class BasicStemModified(nn.Sequential):
def __init__(self):
super(BasicStemModified, self).__init__(
nn.Conv3d(1, 45, kernel_size=(7, 7, 1), #changing filter to 1 channel input
stride=(2, 2, 1), padding=(3, 3, 0),
bias=False),
nn.BatchNorm3d(45),
nn.ReLU(inplace=True),
nn.Conv3d(45, 64, kernel_size=(1, 1, 3),
stride=(1, 1, 1), padding=(0, 0, 1),
bias=False),
nn.BatchNorm3d(64),
nn.ReLU(inplace=True))
model = torchvision.models.video.mc3_18(pretrained=False)
model.stem = BasicStemModified() #here assigning the modified stem
model.fc = nn.Sequential(
nn.Dropout(0.3),
nn.Linear(model.fc.in_features, num_classes)
)
model.to('cuda:0')
对于 R(2+1)D:
#For R(2+1)D model `stem=R2Plus1dStem`
class R2Plus1dStemModified(nn.Sequential):
"""R(2+1)D stem is different than the default one as it uses separated 3D convolution
"""
def __init__(self):
super(R2Plus1dStemModified, self).__init__(
nn.Conv3d(3, 45, kernel_size=(1, 7, 7), #changing filter to 1 channel input
stride=(1, 2, 2), padding=(0, 3, 3),
bias=False),
nn.BatchNorm3d(45),
nn.ReLU(inplace=True),
nn.Conv3d(45, 64, kernel_size=(3, 1, 1),
stride=(1, 1, 1), padding=(1, 0, 0),
bias=False),
nn.BatchNorm3d(64),
nn.ReLU(inplace=True))
model = torchvision.models.video.mc3_18(pretrained=False)
model.stem = R2Plus1dStemModified() #here assigning the modified stem
model.fc = nn.Sequential(
nn.Dropout(0.3),
nn.Linear(model.fc.in_features, num_classes)
)
model.to('cuda:0')
从 RGB 切换到灰色时,最简单的方法是更改数据而不是模型:
如果您的输入帧只有一个通道(灰色),您可以简单地 expand
单例通道维度来跨越三个通道。这很简单,允许您按原样使用预训练模型。
如果您坚持要修改模型 - 您可以在保留大部分预训练权重的情况下这样做:
model = torchvision.models.video.mc3_18(pretrained=True) # get the pretrained
# modify only the first conv layer
origc = model.stem[0] # the orig conv layer
# build a new layer only with one input channel
c1 = torch.nn.Conv3d(1, origc.out_channels, kernel_size=origc.kernel_size, stride=origc.stride, padding=origc.padding, bias=origc.bias)
# this is the nice part - init the new weights using the original ones
with torch.no_grad():
c1.weight.data = origc.weight.data.sum(dim=1, keepdim=True)