如何使用带渐变但不调整权重的图层?
How to use a layer with gradient but without weight adjustment?
是否可以标记部分正向传播只反向传播梯度而不调整权重?
在下面的示例代码中,我有一个 Module
,它只使用一层(一组参数),但在向前的步骤中使用了两次。在优化过程中,我希望权重也被调整两次。如果我只想调整其中一个图层使用的权重,我该怎么办?
import torch
class ExampleModel(torch.nn.Module):
def __init__(self, dim) -> None:
super(ExampleModel, self).__init__()
self.linear = torch.nn.Linear(dim, dim)
def forward(self, x):
out1 = self.linear(x) # backprop gradients and adjust weights here
out2 = self.linear(out1) # only backprop gradients here
return out2
# Random input output data for this example
N, D = 64, 100
x = torch.randn(N, D)
y = torch.randn(N, D)
model = ExampleModel(D)
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.Adam(model.parameters())
y_pred = model(x)
loss = criterion(y_pred, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
以下将不起作用,因为 torch.no_grad()
根本没有梯度被反向传播:
def forward(self, x):
out1 = self.linear(x) # backprop gradients and adjust weights here
with torch.no_grad():
out2 = self.linear(out1) # only backprop gradients here
return out2
我不能简单地将参数从优化中排除,因为它们需要在第一部分(即 out1 = self.linear(x)
)中进行优化。
出于同样的原因,我也不能将这些参数的学习率设置为 0。
我还能做些什么来实现这个目标?
一种方法是使用requires_grad_暂时禁用图层参数的渐变:
def forward(self, x):
out1 = self.linear(x) # backprop gradients and adjust weights here
self.linear.requires_grad_(False)
out2 = self.linear(out1) # only backprop gradients here
self.linear.requires_grad_(True)
return out2
这仍然让梯度流过激活;它只是阻止它们达到参数。
您也可以考虑手动操作权重张量并调用 .detach()
:
import torch.nn.functional as F
def forward(self, x):
out1 = self.linear(x)
out2 = F.linear(out1, self.linear.weight.detach(), self.linear.bias.detach())
return out2
是否可以标记部分正向传播只反向传播梯度而不调整权重?
在下面的示例代码中,我有一个 Module
,它只使用一层(一组参数),但在向前的步骤中使用了两次。在优化过程中,我希望权重也被调整两次。如果我只想调整其中一个图层使用的权重,我该怎么办?
import torch
class ExampleModel(torch.nn.Module):
def __init__(self, dim) -> None:
super(ExampleModel, self).__init__()
self.linear = torch.nn.Linear(dim, dim)
def forward(self, x):
out1 = self.linear(x) # backprop gradients and adjust weights here
out2 = self.linear(out1) # only backprop gradients here
return out2
# Random input output data for this example
N, D = 64, 100
x = torch.randn(N, D)
y = torch.randn(N, D)
model = ExampleModel(D)
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.Adam(model.parameters())
y_pred = model(x)
loss = criterion(y_pred, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
以下将不起作用,因为 torch.no_grad()
根本没有梯度被反向传播:
def forward(self, x):
out1 = self.linear(x) # backprop gradients and adjust weights here
with torch.no_grad():
out2 = self.linear(out1) # only backprop gradients here
return out2
我不能简单地将参数从优化中排除,因为它们需要在第一部分(即 out1 = self.linear(x)
)中进行优化。
出于同样的原因,我也不能将这些参数的学习率设置为 0。
我还能做些什么来实现这个目标?
一种方法是使用requires_grad_暂时禁用图层参数的渐变:
def forward(self, x):
out1 = self.linear(x) # backprop gradients and adjust weights here
self.linear.requires_grad_(False)
out2 = self.linear(out1) # only backprop gradients here
self.linear.requires_grad_(True)
return out2
这仍然让梯度流过激活;它只是阻止它们达到参数。
您也可以考虑手动操作权重张量并调用 .detach()
:
import torch.nn.functional as F
def forward(self, x):
out1 = self.linear(x)
out2 = F.linear(out1, self.linear.weight.detach(), self.linear.bias.detach())
return out2