PyTorch AutoEncoder - 解码输出维度与输入不同
PyTorch AutoEncoder - Decoded output dimension not the same as input
我正在构建一个自定义自动编码器来训练数据集。我的模型如下
class AutoEncoder(nn.Module):
def __init__(self):
super(AutoEncoder,self).__init__()
self.encoder = nn.Sequential(
nn.Conv2d(in_channels = 3, out_channels = 32, kernel_size=3,stride=1),
nn.ReLU(inplace=True),
nn.Conv2d(in_channels = 32, out_channels = 64, kernel_size=3,stride=1),
nn.ReLU(inplace=True),
nn.Conv2d(in_channels = 64, out_channels = 128, kernel_size=3,stride=1),
nn.ReLU(inplace=True),
nn.Conv2d(in_channels=128,out_channels=256,kernel_size=5,stride=2),
nn.ReLU(inplace=True),
nn.Conv2d(in_channels=256,out_channels=512,kernel_size=5,stride=2),
nn.ReLU(inplace=True),
nn.Conv2d(in_channels=512,out_channels=1024,kernel_size=5,stride=2),
nn.ReLU(inplace=True)
)
self.decoder = nn.Sequential(
nn.ConvTranspose2d(in_channels=1024,out_channels=512,kernel_size=5,stride=2),
nn.ReLU(inplace=True),
nn.ConvTranspose2d(in_channels=512,out_channels=256,kernel_size=5,stride=2),
nn.ReLU(inplace=True),
nn.ConvTranspose2d(in_channels=256,out_channels=128,kernel_size=5,stride=2),
nn.ReLU(inplace=True),
nn.ConvTranspose2d(in_channels=128,out_channels=64,kernel_size=3,stride=1),
nn.ReLU(inplace=True),
nn.ConvTranspose2d(in_channels=64,out_channels=32,kernel_size=3,stride=1),
nn.ReLU(inplace=True),
nn.ConvTranspose2d(in_channels=32,out_channels=3,kernel_size=3,stride=1),
nn.ReLU(inplace=True)
)
def forward(self,x):
x = self.encoder(x)
print(x.shape)
x = self.decoder(x)
return x
def unit_test():
num_minibatch = 16
img = torch.randn(num_minibatch, 3, 512, 640).cuda(0)
model = AutoEncoder().cuda()
model = nn.DataParallel(model)
output = model(img)
print(output.shape)
if __name__ == '__main__':
unit_test()
如您所见,我的输入维度是 (3, 512, 640),但通过解码器后的输出是 (3, 507, 635)。添加 Conv2D Transpose 层时我是否遗漏了什么?
如有任何帮助,我们将不胜感激。谢谢
不匹配是由ConvTranspose2d
层的不同输出形状引起的。可以在第一个和第三个转置卷积层中加入output_padding
of 1来解决这个问题。
即nn.ConvTranspose2d(in_channels=1024,out_channels=512,kernel_size=5,stride=2, output_padding=1)
和 nn.ConvTranspose2d(in_channels=256,out_channels=128,kernel_size=5,stride=2, output_padding=1)
根据 documentation:
When stride > 1, Conv2d
maps multiple input shapes to the same output shape. output_padding
is provided to resolve this ambiguity by effectively increasing the calculated output shape on one side.
添加前解码层的形状output_padding
:
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
ConvTranspose2d-1 [-1, 512, 123, 155] 13,107,712
ReLU-2 [-1, 512, 123, 155] 0
ConvTranspose2d-3 [-1, 256, 249, 313] 3,277,056
ReLU-4 [-1, 256, 249, 313] 0
ConvTranspose2d-5 [-1, 128, 501, 629] 819,328
ReLU-6 [-1, 128, 501, 629] 0
ConvTranspose2d-7 [-1, 64, 503, 631] 73,792
ReLU-8 [-1, 64, 503, 631] 0
ConvTranspose2d-9 [-1, 32, 505, 633] 18,464
ReLU-10 [-1, 32, 505, 633] 0
ConvTranspose2d-11 [-1, 3, 507, 635] 867
ReLU-12 [-1, 3, 507, 635] 0
添加填充后:
================================================================
ConvTranspose2d-1 [-1, 512, 124, 156] 13,107,712
ReLU-2 [-1, 512, 124, 156] 0
ConvTranspose2d-3 [-1, 256, 251, 315] 3,277,056
ReLU-4 [-1, 256, 251, 315] 0
ConvTranspose2d-5 [-1, 128, 506, 634] 819,328
ReLU-6 [-1, 128, 506, 634] 0
ConvTranspose2d-7 [-1, 64, 508, 636] 73,792
ReLU-8 [-1, 64, 508, 636] 0
ConvTranspose2d-9 [-1, 32, 510, 638] 18,464
ReLU-10 [-1, 32, 510, 638] 0
ConvTranspose2d-11 [-1, 3, 512, 640] 867
ReLU-12 [-1, 3, 512, 640] 0
我正在构建一个自定义自动编码器来训练数据集。我的模型如下
class AutoEncoder(nn.Module):
def __init__(self):
super(AutoEncoder,self).__init__()
self.encoder = nn.Sequential(
nn.Conv2d(in_channels = 3, out_channels = 32, kernel_size=3,stride=1),
nn.ReLU(inplace=True),
nn.Conv2d(in_channels = 32, out_channels = 64, kernel_size=3,stride=1),
nn.ReLU(inplace=True),
nn.Conv2d(in_channels = 64, out_channels = 128, kernel_size=3,stride=1),
nn.ReLU(inplace=True),
nn.Conv2d(in_channels=128,out_channels=256,kernel_size=5,stride=2),
nn.ReLU(inplace=True),
nn.Conv2d(in_channels=256,out_channels=512,kernel_size=5,stride=2),
nn.ReLU(inplace=True),
nn.Conv2d(in_channels=512,out_channels=1024,kernel_size=5,stride=2),
nn.ReLU(inplace=True)
)
self.decoder = nn.Sequential(
nn.ConvTranspose2d(in_channels=1024,out_channels=512,kernel_size=5,stride=2),
nn.ReLU(inplace=True),
nn.ConvTranspose2d(in_channels=512,out_channels=256,kernel_size=5,stride=2),
nn.ReLU(inplace=True),
nn.ConvTranspose2d(in_channels=256,out_channels=128,kernel_size=5,stride=2),
nn.ReLU(inplace=True),
nn.ConvTranspose2d(in_channels=128,out_channels=64,kernel_size=3,stride=1),
nn.ReLU(inplace=True),
nn.ConvTranspose2d(in_channels=64,out_channels=32,kernel_size=3,stride=1),
nn.ReLU(inplace=True),
nn.ConvTranspose2d(in_channels=32,out_channels=3,kernel_size=3,stride=1),
nn.ReLU(inplace=True)
)
def forward(self,x):
x = self.encoder(x)
print(x.shape)
x = self.decoder(x)
return x
def unit_test():
num_minibatch = 16
img = torch.randn(num_minibatch, 3, 512, 640).cuda(0)
model = AutoEncoder().cuda()
model = nn.DataParallel(model)
output = model(img)
print(output.shape)
if __name__ == '__main__':
unit_test()
如您所见,我的输入维度是 (3, 512, 640),但通过解码器后的输出是 (3, 507, 635)。添加 Conv2D Transpose 层时我是否遗漏了什么?
如有任何帮助,我们将不胜感激。谢谢
不匹配是由ConvTranspose2d
层的不同输出形状引起的。可以在第一个和第三个转置卷积层中加入output_padding
of 1来解决这个问题。
即nn.ConvTranspose2d(in_channels=1024,out_channels=512,kernel_size=5,stride=2, output_padding=1)
和 nn.ConvTranspose2d(in_channels=256,out_channels=128,kernel_size=5,stride=2, output_padding=1)
根据 documentation:
When stride > 1,
Conv2d
maps multiple input shapes to the same output shape.output_padding
is provided to resolve this ambiguity by effectively increasing the calculated output shape on one side.
添加前解码层的形状output_padding
:
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
ConvTranspose2d-1 [-1, 512, 123, 155] 13,107,712
ReLU-2 [-1, 512, 123, 155] 0
ConvTranspose2d-3 [-1, 256, 249, 313] 3,277,056
ReLU-4 [-1, 256, 249, 313] 0
ConvTranspose2d-5 [-1, 128, 501, 629] 819,328
ReLU-6 [-1, 128, 501, 629] 0
ConvTranspose2d-7 [-1, 64, 503, 631] 73,792
ReLU-8 [-1, 64, 503, 631] 0
ConvTranspose2d-9 [-1, 32, 505, 633] 18,464
ReLU-10 [-1, 32, 505, 633] 0
ConvTranspose2d-11 [-1, 3, 507, 635] 867
ReLU-12 [-1, 3, 507, 635] 0
添加填充后:
================================================================
ConvTranspose2d-1 [-1, 512, 124, 156] 13,107,712
ReLU-2 [-1, 512, 124, 156] 0
ConvTranspose2d-3 [-1, 256, 251, 315] 3,277,056
ReLU-4 [-1, 256, 251, 315] 0
ConvTranspose2d-5 [-1, 128, 506, 634] 819,328
ReLU-6 [-1, 128, 506, 634] 0
ConvTranspose2d-7 [-1, 64, 508, 636] 73,792
ReLU-8 [-1, 64, 508, 636] 0
ConvTranspose2d-9 [-1, 32, 510, 638] 18,464
ReLU-10 [-1, 32, 510, 638] 0
ConvTranspose2d-11 [-1, 3, 512, 640] 867
ReLU-12 [-1, 3, 512, 640] 0