火炬:"one of the variables needed for gradient computation has been modified by an inplace operation"
PyTorch: "one of the variables needed for gradient computation has been modified by an inplace operation"
我正在用歌词文本文件训练 PyTorch RNN,以预测给定字符的下一个字符。
我的 RNN 是这样定义的:
import torch.nn as nn
import torch.optim
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(RNN, self).__init__()
self.hidden_size = hidden_size
# from input, previous hidden state to new hidden state
self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
# from input, previous hidden state to output
self.i2o = nn.Linear(input_size + hidden_size, output_size)
# softmax on output
self.softmax = nn.LogSoftmax(dim = 1)
def forward(self, input, hidden):
combined = torch.cat((input, hidden), 1)
#get new hidden state
hidden = self.i2h(combined)
#get output
output = self.i2o(combined)
#apply softmax
output = self.softmax(output)
return output, hidden
def initHidden(self):
return torch.zeros(1, self.hidden_size)
rnn = RNN(input_size = num_chars, hidden_size = 200, output_size = num_chars)
criterion = nn.NLLLoss()
lr = 0.01
optimizer = torch.optim.AdamW(rnn.parameters(), lr = lr)
这是我的训练函数:
def train(train, target):
hidden = rnn.initHidden()
loss = 0
for i in range(len(train)):
optimizer.zero_grad()
# get output, hidden state from rnn given input char, hidden state
output, hidden = rnn(train[i].unsqueeze(0), hidden)
#returns the index with '1' - indentifying the index of the right character
target_class = (target[i] == 1).nonzero(as_tuple=True)[0]
loss += criterion(output, target_class)
loss.backward(retain_graph = True)
optimizer.step()
print("done " + str(i) + " loop")
return output, loss.item() / train.size(0)
当我 运行 我的训练函数时,我得到这个错误:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [274, 74]], which is output 0 of TBackward, is at version 5; expected version 3 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
有趣的是,它在给我那个错误之前完成了两个完整的训练函数循环。
现在,当我从 loss.backward()
中删除 retain_graph = True
时,出现此错误:
RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.
不应该在这里多次尝试向后遍历图表。也许图表在训练循环之间没有被清除?
问题是您在变量 loss
上累积损失值(同时,附加到它们的相关计算图),此处:
loss += criterion(output, target_class)
反过来,这意味着在每次迭代中,您都试图反向传播当前的 和先前的 损失值,这些损失值是在先前的推理中计算出来的。在这个循环遍历数据集的特定实例中,这不是正确的做法。
一个简单的解决方法是使用 item
累积 loss
的基础值, 即 标量值,而不是张量本身。并且,在当前损失张量上反向传播:
total_loss = 0
for i in range(len(train)):
optimizer.zero_grad()
output, hidden = rnn(train[i].unsqueeze(0), hidden)
target_class = (target[i] == 1).nonzero(as_tuple=True)[0]
loss = criterion(output, target_class)
loss.backward()
total_loss += loss.item()
由于您是在完成反向传播后立即更新模型的参数,因此您不需要在内存中保留图表。
我正在用歌词文本文件训练 PyTorch RNN,以预测给定字符的下一个字符。
我的 RNN 是这样定义的:
import torch.nn as nn
import torch.optim
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(RNN, self).__init__()
self.hidden_size = hidden_size
# from input, previous hidden state to new hidden state
self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
# from input, previous hidden state to output
self.i2o = nn.Linear(input_size + hidden_size, output_size)
# softmax on output
self.softmax = nn.LogSoftmax(dim = 1)
def forward(self, input, hidden):
combined = torch.cat((input, hidden), 1)
#get new hidden state
hidden = self.i2h(combined)
#get output
output = self.i2o(combined)
#apply softmax
output = self.softmax(output)
return output, hidden
def initHidden(self):
return torch.zeros(1, self.hidden_size)
rnn = RNN(input_size = num_chars, hidden_size = 200, output_size = num_chars)
criterion = nn.NLLLoss()
lr = 0.01
optimizer = torch.optim.AdamW(rnn.parameters(), lr = lr)
这是我的训练函数:
def train(train, target):
hidden = rnn.initHidden()
loss = 0
for i in range(len(train)):
optimizer.zero_grad()
# get output, hidden state from rnn given input char, hidden state
output, hidden = rnn(train[i].unsqueeze(0), hidden)
#returns the index with '1' - indentifying the index of the right character
target_class = (target[i] == 1).nonzero(as_tuple=True)[0]
loss += criterion(output, target_class)
loss.backward(retain_graph = True)
optimizer.step()
print("done " + str(i) + " loop")
return output, loss.item() / train.size(0)
当我 运行 我的训练函数时,我得到这个错误:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [274, 74]], which is output 0 of TBackward, is at version 5; expected version 3 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
有趣的是,它在给我那个错误之前完成了两个完整的训练函数循环。
现在,当我从 loss.backward()
中删除 retain_graph = True
时,出现此错误:
RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.
不应该在这里多次尝试向后遍历图表。也许图表在训练循环之间没有被清除?
问题是您在变量 loss
上累积损失值(同时,附加到它们的相关计算图),此处:
loss += criterion(output, target_class)
反过来,这意味着在每次迭代中,您都试图反向传播当前的 和先前的 损失值,这些损失值是在先前的推理中计算出来的。在这个循环遍历数据集的特定实例中,这不是正确的做法。
一个简单的解决方法是使用 item
累积 loss
的基础值, 即 标量值,而不是张量本身。并且,在当前损失张量上反向传播:
total_loss = 0
for i in range(len(train)):
optimizer.zero_grad()
output, hidden = rnn(train[i].unsqueeze(0), hidden)
target_class = (target[i] == 1).nonzero(as_tuple=True)[0]
loss = criterion(output, target_class)
loss.backward()
total_loss += loss.item()
由于您是在完成反向传播后立即更新模型的参数,因此您不需要在内存中保留图表。