火炬："one of the variables needed for gradient computation has been modified by an inplace operation"

Question

我正在用歌词文本文件训练 PyTorch RNN，以预测给定字符的下一个字符。

我的 RNN 是这样定义的：


import torch.nn as nn
import torch.optim

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(RNN, self).__init__()
        
        self.hidden_size = hidden_size
        
        # from input, previous hidden state to new hidden state
        self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
        
        # from input, previous hidden state to output
        self.i2o = nn.Linear(input_size + hidden_size, output_size)
        
        # softmax on output
        self.softmax = nn.LogSoftmax(dim = 1)
    
    def forward(self, input, hidden):
        
        combined = torch.cat((input, hidden), 1)
        
        #get new hidden state
        hidden = self.i2h(combined)
        
        #get output
        output = self.i2o(combined)
        
        #apply softmax
        output = self.softmax(output)
        return output, hidden
    
    def initHidden(self): 
        return torch.zeros(1, self.hidden_size)

rnn = RNN(input_size = num_chars, hidden_size = 200, output_size = num_chars)
criterion = nn.NLLLoss()

lr = 0.01
optimizer = torch.optim.AdamW(rnn.parameters(), lr = lr)

这是我的训练函数：

def train(train, target):
    
    hidden = rnn.initHidden()
    
    loss = 0
    
    for i in range(len(train)):
        
        optimizer.zero_grad()

        # get output, hidden state from rnn given input char, hidden state
        output, hidden = rnn(train[i].unsqueeze(0), hidden)

        #returns the index with '1' - indentifying the index of the right character
        target_class = (target[i] == 1).nonzero(as_tuple=True)[0]
        
        loss += criterion(output, target_class)
        
    
        loss.backward(retain_graph = True)
        optimizer.step()
        
        print("done " + str(i) + " loop")
    
    return output, loss.item() / train.size(0)

当我运行我的训练函数时，我得到这个错误：

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [274, 74]], which is output 0 of TBackward, is at version 5; expected version 3 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

有趣的是，它在给我那个错误之前完成了两个完整的训练函数循环。

现在，当我从 loss.backward() 中删除 retain_graph = True 时，出现此错误：

RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.

不应该在这里多次尝试向后遍历图表。也许图表在训练循环之间没有被清除？

Answer 1

问题是您在变量 loss 上累积损失值（同时，附加到它们的相关计算图），此处：

    loss += criterion(output, target_class)

反过来，这意味着在每次迭代中，您都试图反向传播当前的 和先前的 损失值，这些损失值是在先前的推理中计算出来的。在这个循环遍历数据集的特定实例中，这不是正确的做法。

一个简单的解决方法是使用 item 累积 loss 的基础值，即标量值，而不是张量本身。并且，在当前损失张量上反向传播：

total_loss = 0
    
for i in range(len(train)):
    optimizer.zero_grad()
    output, hidden = rnn(train[i].unsqueeze(0), hidden)
    target_class = (target[i] == 1).nonzero(as_tuple=True)[0]
        
    loss = criterion(output, target_class)
    loss.backward()

    total_loss += loss.item()

由于您是在完成反向传播后立即更新模型的参数，因此您不需要在内存中保留图表。

火炬："one of the variables needed for gradient computation has been modified by an inplace operation"

PyTorch: "one of the variables needed for gradient computation has been modified by an inplace operation"

python

recurrent-neural-network

pytorch