retain_graph Pytorch 1.6 中的 GRU 问题
retain_graph problem with GRU in Pytorch 1.6
我知道,在使用 loss.backward()
时,如果有多个网络和多个损失函数来分别优化每个网络,我们需要指定 retain_graph=True
。但即使有(或没有)指定此参数,我也会收到错误。以下是重现问题的 MWE(在 PyTorch 1.6 上)。
import torch
from torch import nn
from torch import optim
torch.autograd.set_detect_anomaly(True)
class GRU1(nn.Module):
def __init__(self):
super(GRU1, self).__init__()
self.brnn = nn.GRU(input_size=2, bidirectional=True, num_layers=1, hidden_size=100)
def forward(self, x):
return self.brnn(x)
class GRU2(nn.Module):
def __init__(self):
super(GRU2, self).__init__()
self.brnn = nn.GRU(input_size=200, bidirectional=True, num_layers=1, hidden_size=1)
def forward(self, x):
return self.brnn(x)
gru1 = GRU1()
gru2 = GRU2()
gru1_opt = optim.Adam(gru1.parameters())
gru2_opt = optim.Adam(gru2.parameters())
criterion = nn.MSELoss()
for i in range(100):
gru1_opt.zero_grad()
gru2_opt.zero_grad()
vector = torch.randn((15, 100, 2))
gru1_output, _ = gru1(vector) # (15, 100, 200)
loss_gru1 = criterion(gru1_output, torch.randn((15, 100, 200)))
loss_gru1.backward(retain_graph=True)
gru1_opt.step()
gru1_output, _ = gru1(vector) # (15, 100, 200)
gru2_output, _ = gru2(gru1_output) # (15, 100, 2)
loss_gru2 = criterion(gru2_output, torch.randn((15, 100, 2)))
loss_gru2.backward(retain_graph=True)
gru2_opt.step()
print(f"GRU1 loss: {loss_gru1.item()}, GRU2 loss: {loss_gru2.item()}")
将 retain_graph
设置为 True
我收到错误消息
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [100, 300]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
没有参数的错误是
RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time.
这是预期的。
请指出上面代码中需要更改的内容才能开始训练。感谢任何帮助。
在这种情况下,可以将计算图分离出来,排除不需要优化的参数。在这种情况下,计算图应该在第二次前向传递后分离 gru1
即
....
gru1_opt.step()
gru1_output, _ = gru1(vector)
gru1_output = gru1_output.detach()
....
这样,您就不会像提到的错误那样“再次尝试向后遍历图形”。
我知道,在使用 loss.backward()
时,如果有多个网络和多个损失函数来分别优化每个网络,我们需要指定 retain_graph=True
。但即使有(或没有)指定此参数,我也会收到错误。以下是重现问题的 MWE(在 PyTorch 1.6 上)。
import torch
from torch import nn
from torch import optim
torch.autograd.set_detect_anomaly(True)
class GRU1(nn.Module):
def __init__(self):
super(GRU1, self).__init__()
self.brnn = nn.GRU(input_size=2, bidirectional=True, num_layers=1, hidden_size=100)
def forward(self, x):
return self.brnn(x)
class GRU2(nn.Module):
def __init__(self):
super(GRU2, self).__init__()
self.brnn = nn.GRU(input_size=200, bidirectional=True, num_layers=1, hidden_size=1)
def forward(self, x):
return self.brnn(x)
gru1 = GRU1()
gru2 = GRU2()
gru1_opt = optim.Adam(gru1.parameters())
gru2_opt = optim.Adam(gru2.parameters())
criterion = nn.MSELoss()
for i in range(100):
gru1_opt.zero_grad()
gru2_opt.zero_grad()
vector = torch.randn((15, 100, 2))
gru1_output, _ = gru1(vector) # (15, 100, 200)
loss_gru1 = criterion(gru1_output, torch.randn((15, 100, 200)))
loss_gru1.backward(retain_graph=True)
gru1_opt.step()
gru1_output, _ = gru1(vector) # (15, 100, 200)
gru2_output, _ = gru2(gru1_output) # (15, 100, 2)
loss_gru2 = criterion(gru2_output, torch.randn((15, 100, 2)))
loss_gru2.backward(retain_graph=True)
gru2_opt.step()
print(f"GRU1 loss: {loss_gru1.item()}, GRU2 loss: {loss_gru2.item()}")
将 retain_graph
设置为 True
我收到错误消息
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [100, 300]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
没有参数的错误是
RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time.
这是预期的。
请指出上面代码中需要更改的内容才能开始训练。感谢任何帮助。
在这种情况下,可以将计算图分离出来,排除不需要优化的参数。在这种情况下,计算图应该在第二次前向传递后分离 gru1
即
....
gru1_opt.step()
gru1_output, _ = gru1(vector)
gru1_output = gru1_output.detach()
....
这样,您就不会像提到的错误那样“再次尝试向后遍历图形”。