了解线性层中的 Pytorch 权重和偏差
Understanding Pytorch Weight and Biases in Linear Layer
下面是将权重和偏差合并到一个层中的代码,我无法理解下面的行,为什么我们必须将权重转置矩阵与 bais 相乘。我应该在没有权重的情况下进行偏置,因为我们正在乘以权重以获得最终输出 3
combined_layer.bias.data = layer1.bias @ layer2.weight.t() + layer2.bias
# Create a single layer to replace the two linear layers
combined_layer = nn.Linear(input_size, output_size)
combined_layer.weight.data = layer2.weight @ layer1.weight
combined_layer.bias.data = layer1.bias @ layer2.weight.t() + layer2.bias //This should be just bias
outputs3 = inputs @ combined_layer.weight.t() + combined_layer.bias
谁能帮我理解一下?
您只需展开两个 Linear
层的原始方程,即
# out = layer2(layer1(x))
# given (x @ A + B) @ C + D
out = (x @ layer1.weight.t() + layer1.bias) @ layer2.weight.t() + layer2.bias
你可以展开(x @ A + B) @ C + D = (x @ A @ C) + B @ C + D
out = x @ layer1.weight.t() @ layer2.weight.t() + layer1.bias @ layer2.weight.t() + layer2.bias
out = x @ (layer1.weight.t() @ layer2.weight.t()) + (layer1.bias @ layer2.weight.t() + layer2.bias)
# the above equation is x @ (A @ C) + B @ C + D
# now you can assume
combined_layer.weight = layer2.weight @ layer1.weight
combined_layer.bias = layer1.bias @ layer2.weight.t() + layer2.bias
# final output
out = x @ combined_layer.weight.t() + combined_layer.bias
另请注意,这里也使用了矩阵乘法转置规则,即
transpose(A@B) = transpose(B) @ transpose(A)
这就是 combined_layer.weight.t()
乘以 x 的原因,因为我们没有在 layer2.weight @ layer1.weight
中进行转置。
下面是将权重和偏差合并到一个层中的代码,我无法理解下面的行,为什么我们必须将权重转置矩阵与 bais 相乘。我应该在没有权重的情况下进行偏置,因为我们正在乘以权重以获得最终输出 3
combined_layer.bias.data = layer1.bias @ layer2.weight.t() + layer2.bias
# Create a single layer to replace the two linear layers
combined_layer = nn.Linear(input_size, output_size)
combined_layer.weight.data = layer2.weight @ layer1.weight
combined_layer.bias.data = layer1.bias @ layer2.weight.t() + layer2.bias //This should be just bias
outputs3 = inputs @ combined_layer.weight.t() + combined_layer.bias
谁能帮我理解一下?
您只需展开两个 Linear
层的原始方程,即
# out = layer2(layer1(x))
# given (x @ A + B) @ C + D
out = (x @ layer1.weight.t() + layer1.bias) @ layer2.weight.t() + layer2.bias
你可以展开(x @ A + B) @ C + D = (x @ A @ C) + B @ C + D
out = x @ layer1.weight.t() @ layer2.weight.t() + layer1.bias @ layer2.weight.t() + layer2.bias
out = x @ (layer1.weight.t() @ layer2.weight.t()) + (layer1.bias @ layer2.weight.t() + layer2.bias)
# the above equation is x @ (A @ C) + B @ C + D
# now you can assume
combined_layer.weight = layer2.weight @ layer1.weight
combined_layer.bias = layer1.bias @ layer2.weight.t() + layer2.bias
# final output
out = x @ combined_layer.weight.t() + combined_layer.bias
另请注意,这里也使用了矩阵乘法转置规则,即
transpose(A@B) = transpose(B) @ transpose(A)
这就是 combined_layer.weight.t()
乘以 x 的原因,因为我们没有在 layer2.weight @ layer1.weight
中进行转置。