PyTorch

Question

我在 Pytorch 中创建了一个全连接网络，输入层的形状为 (1,784)，第一个隐藏层的形状为 (1,256)。简而言之：nn.Linear(in_features=784, out_features=256, bias=True)

方法 1：model.fc1.weight.data.shape 给我 torch.Size([128, 256])，而

方法 2：list(model.parameters())[0].shape 给我 torch.Size([256, 784])

事实上，在大小为 784 的输入层和大小为 256 的隐藏层之间，我期待一个形状为 (784,256) 的矩阵。因此，在第一种情况下，我看到下一个隐藏层 (128) 的形状，这对于输入和第一个隐藏层之间的权重没有意义，在第二种情况下，看起来 Pytorch 进行了变换的权重矩阵。

我不太明白 Pytorch 如何塑造不同的权重矩阵，以及如何在训练后访问各个权重。我应该使用方法 1 还是方法 2？当我显示相应的张量时，显示看起来完全相似，但形状不同。

Answer 1

在 Pytorch 中，模型参数的权重在对输入矩阵应用 matmul 操作之前被转置。这就是权重矩阵维度被翻转的原因，并且与您期望的不同；即，您观察到它不是 [784, 256]，而是 [256, 784]。

您可以查看 nn.Linear 的 Pytorch 源文档，我们有：

...

self.weight = Parameter(torch.Tensor(out_features, in_features))

...

def forward(self, input):
        return F.linear(input, self.weight, self.bias)

查看F.linear的实现时，我们看到将输入矩阵与权重矩阵的转置相乘的对应行：

output = input.matmul(weight.t())

PyTorch - 模型参数权重的意外形状

PyTorch - unexpected shape of model parameters weights

neural-network

deep-learning

tensor