PyTorch: "one of the variables needed for gradient computation has been modified by an inplace operation"

我正在用歌词文本文件训练 PyTorch RNN,以预测给定字符的下一个字符。

我的 RNN 是这样定义的:

import torch.nn as nn
import torch.optim

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(RNN, self).__init__()
        self.hidden_size = hidden_size
        # from input, previous hidden state to new hidden state
        self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
        # from input, previous hidden state to output
        self.i2o = nn.Linear(input_size + hidden_size, output_size)
        # softmax on output
        self.softmax = nn.LogSoftmax(dim = 1)
    def forward(self, input, hidden):
        combined = torch.cat((input, hidden), 1)
        #get new hidden state
        hidden = self.i2h(combined)
        #get output
        output = self.i2o(combined)
        #apply softmax
        output = self.softmax(output)
        return output, hidden
    def initHidden(self): 
        return torch.zeros(1, self.hidden_size)

rnn = RNN(input_size = num_chars, hidden_size = 200, output_size = num_chars)
criterion = nn.NLLLoss()

lr = 0.01
optimizer = torch.optim.AdamW(rnn.parameters(), lr = lr)


def train(train, target):
    hidden = rnn.initHidden()
    loss = 0
    for i in range(len(train)):

        # get output, hidden state from rnn given input char, hidden state
        output, hidden = rnn(train[i].unsqueeze(0), hidden)

        #returns the index with '1' - indentifying the index of the right character
        target_class = (target[i] == 1).nonzero(as_tuple=True)[0]
        loss += criterion(output, target_class)
        loss.backward(retain_graph = True)
        print("done " + str(i) + " loop")
    return output, loss.item() / train.size(0)

当我 运行 我的训练函数时,我得到这个错误:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [274, 74]], which is output 0 of TBackward, is at version 5; expected version 3 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!


现在,当我从 loss.backward() 中删除 retain_graph = True 时,出现此错误:

RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.


问题是您在变量 loss 上累积损失值(同时,附加到它们的相关计算图),此处:

    loss += criterion(output, target_class)

反过来,这意味着在每次迭代中,您都试图反向传播当前的 和先前的 损失值,这些损失值是在先前的推理中计算出来的。在这个循环遍历数据集的特定实例中,这不是正确的做法。

一个简单的解决方法是使用 item 累积 loss 的基础值, 标量值,而不是张量本身。并且,在当前损失张量上反向传播:

total_loss = 0
for i in range(len(train)):
    output, hidden = rnn(train[i].unsqueeze(0), hidden)
    target_class = (target[i] == 1).nonzero(as_tuple=True)[0]
    loss = criterion(output, target_class)

    total_loss += loss.item()
