如何使用带渐变但不调整权重的图层？

Question

是否可以标记部分正向传播只反向传播梯度而不调整权重？

在下面的示例代码中，我有一个 Module，它只使用一层（一组参数），但在向前的步骤中使用了两次。在优化过程中，我希望权重也被调整两次。如果我只想调整其中一个图层使用的权重，我该怎么办？

import torch
    
class ExampleModel(torch.nn.Module):
    
    def __init__(self, dim) -> None:
        super(ExampleModel, self).__init__()
        self.linear = torch.nn.Linear(dim, dim)
    
    def forward(self, x):
        out1 = self.linear(x)  # backprop gradients and adjust weights here
        out2 = self.linear(out1)  # only backprop gradients here
        return out2
    
    
# Random input output data for this example
N, D = 64, 100
x = torch.randn(N, D)
y = torch.randn(N, D)
    
model = ExampleModel(D)
    
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.Adam(model.parameters())
    
y_pred = model(x)
loss = criterion(y_pred, y)
    
optimizer.zero_grad()
loss.backward()
optimizer.step()

以下将不起作用，因为 torch.no_grad() 根本没有梯度被反向传播：

def forward(self, x):
    out1 = self.linear(x)  # backprop gradients and adjust weights here
    with torch.no_grad():
        out2 = self.linear(out1)  # only backprop gradients here
    return out2

我不能简单地将参数从优化中排除，因为它们需要在第一部分（即 out1 = self.linear(x)）中进行优化。出于同样的原因，我也不能将这些参数的学习率设置为 0。

我还能做些什么来实现这个目标？

Answer 1

一种方法是使用requires_grad_暂时禁用图层参数的渐变：

def forward(self, x):
    out1 = self.linear(x)  # backprop gradients and adjust weights here
    self.linear.requires_grad_(False)
    out2 = self.linear(out1)  # only backprop gradients here
    self.linear.requires_grad_(True)
    return out2

这仍然让梯度流过激活；它只是阻止它们达到参数。

您也可以考虑手动操作权重张量并调用 .detach():

import torch.nn.functional as F
def forward(self, x):
    out1 = self.linear(x)
    out2 = F.linear(out1, self.linear.weight.detach(), self.linear.bias.detach())
    return out2

如何使用带渐变但不调整权重的图层？

How to use a layer with gradient but without weight adjustment?

optimization

pytorch

backpropagation