在 Pytorch 的以下架构中，我应该将输入图像尺寸放在哪里？

Question

class Discriminator(nn.Module):
def __init__(self, channels=3):
    super(Discriminator, self).__init__()
    
    self.channels = channels

    def convlayer(n_input, n_output, k_size=4, stride=2, padding=0, bn=False):
        block = [nn.Conv2d(n_input, n_output, kernel_size=k_size, stride=stride, padding=padding, bias=False)]
        if bn:
            block.append(nn.BatchNorm2d(n_output))
        block.append(nn.LeakyReLU(0.2, inplace=True))
        return block

    self.model = nn.Sequential(
        *convlayer(self.channels, 32, 4, 2, 1),
        *convlayer(32, 64, 4, 2, 1),
        *convlayer(64, 128, 4, 2, 1, bn=True),
        *convlayer(128, 256, 4, 2, 1, bn=True),
        nn.Conv2d(256, 1, 4, 1, 0, bias=False),  # FC with Conv.
    )

def forward(self, imgs):
    logits = self.model(imgs)
    out = torch.sigmoid(logits)

    return out.view(-1,1)

上面的架构是GAN模型的Discriminator，第一层有点迷糊

*convlayer(self.channels, 32, 4, 2, 1)

self.channels 是 3（彩色图像），通过了，我有一个 64 * 64 * 3 的输入图像。 我的第一个问题是输入图像的尺寸在哪里在上面的架构中注意了吗？

我很困惑，因为当我看到生成器架构时，

class Generator(nn.Module):
def __init__(self, nz=128, channels=3):
    super(Generator, self).__init__()
    
    self.nz = nz
    self.channels = channels
    
    def convlayer(n_input, n_output, k_size=4, stride=2, padding=0):
        block = [
            nn.ConvTranspose2d(n_input, n_output, kernel_size=k_size, stride=stride, padding=padding, bias=False),
            nn.BatchNorm2d(n_output),
            nn.ReLU(inplace=True),
        ]
        return block

    self.model = nn.Sequential(
        *convlayer(self.nz, 1024, 4, 1, 0), # Fully connected layer via convolution.
        *convlayer(1024, 512, 4, 2, 1),
        *convlayer(512, 256, 4, 2, 1),
        *convlayer(256, 128, 4, 2, 1),
        *convlayer(128, 64, 4, 2, 1),
        nn.ConvTranspose2d(64, self.channels, 3, 1, 1),

        nn.Tanh()
    )

def forward(self, z):
    z = z.view(-1, self.nz, 1, 1)
    img = self.model(z)
    return img

第一层

*convlayer(self.nz, 1024, 4, 1, 0)

他们正在传递 self.nz ，这是生成 64 * 64 * 3 图像所需的 128 个随机潜在点，这与上述 通道 [=31= 的鉴别器模型相反] 通过了。

我的第二个问题是，如果我有一张 300 * 300 * 3 的图像，我应该在我的鉴别器架构中更改什么来处理图像？

P.S。我是 Pytorch 新手。

Answer 1

卷积中根本不需要输入图像的尺寸。您要做的就是在图像上执行核卷积 (with/without)。您只需确保卷积层的输入大小大于该层内核的大小。例如：您不能在 2x2 图像上应用 3x3 内核。当然，你可以通过填充来解决这个问题，但一般来说，这是不可能的。

判别器将从您的数据集中或生成器生成的数据集中抽取样本，以评估它是真还是假。由于这是CNN而不是线性层网络，因此您不需要指定输入图像的大小。
生成器将从潜在点中采样，然后生成图像。如果您有 300x300 图像，则无需使用鉴别器进行任何更改。

在 Pytorch 的以下架构中，我应该将输入图像尺寸放在哪里？

Where should i put the input image dimesions in the following architecture in Pytorch?

python-3.x

deep-learning

pytorch

generative-adversarial-network