由就地操作修改的变量之一

One of the variables modified by an inplace operation

我对 Pytorch 比较陌生。这里我想用这个模型来生成一些图像,但是因为这是在Pytorch 1.5之前写的,因为梯度计算已经修复了,所以这是错误信息。

RuntimeError: one of the variables needed for gradient computation has been 
modified by an inplace operation: [torch.cuda.FloatTensor [1, 512, 4, 4]] 
is at version 2; expected version 1 instead. 
Hint: enable anomaly detection to find the operation that 
failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

我看了过去的例子,我不确定这里的问题是什么,我相信它发生在这个区域内,但我不知道在哪里!任何帮助将不胜感激!

def process(self, images, edges, masks):
self.iteration += 1

    # zero optimizers
    self.gen_optimizer.zero_grad()
    self.dis_optimizer.zero_grad()


    # process outputs
    outputs = self(images, edges, masks)
    gen_loss = 0
    dis_loss = 0


    # discriminator loss
    dis_input_real = torch.cat((images, edges), dim=1)
    dis_input_fake = torch.cat((images, outputs.detach()), dim=1)
    dis_real, dis_real_feat = self.discriminator(dis_input_real)        # in: (grayscale(1) + edge(1))
    dis_fake, dis_fake_feat = self.discriminator(dis_input_fake)        # in: (grayscale(1) + edge(1))
    dis_real_loss = self.adversarial_loss(dis_real, True, True)
    dis_fake_loss = self.adversarial_loss(dis_fake, False, True)
    dis_loss += (dis_real_loss + dis_fake_loss) / 2


    # generator adversarial loss
    gen_input_fake = torch.cat((images, outputs), dim=1)
    gen_fake, gen_fake_feat = self.discriminator(gen_input_fake)        # in: (grayscale(1) + edge(1))
    gen_gan_loss = self.adversarial_loss(gen_fake, True, False)
    gen_loss += gen_gan_loss


    # generator feature matching loss
    gen_fm_loss = 0
    for i in range(len(dis_real_feat)):
        gen_fm_loss += self.l1_loss(gen_fake_feat[i], dis_real_feat[i].detach())
    gen_fm_loss = gen_fm_loss * self.config.FM_LOSS_WEIGHT
    gen_loss += gen_fm_loss


    # create logs
    logs = [
        ("l_d1", dis_loss.item()),
        ("l_g1", gen_gan_loss.item()),
        ("l_fm", gen_fm_loss.item()),
    ]

    return outputs, gen_loss, dis_loss, logs

def forward(self, images, edges, masks):
    edges_masked = (edges * (1 - masks))
    images_masked = (images * (1 - masks)) + masks
    inputs = torch.cat((images_masked, edges_masked, masks), dim=1)
    outputs = self.generator(inputs)                                    # in: [grayscale(1) + edge(1) + mask(1)]
    return outputs

def backward(self, gen_loss=None, dis_loss=None):
    if dis_loss is not None:
        dis_loss.backward()
    self.dis_optimizer.step()

    if gen_loss is not None:
        gen_loss.backward()
    self.gen_optimizer.step()

谢谢!

您不能一次性计算鉴别器和生成器的损失,也不能像这样背靠背地进行两个反向传播:

if dis_loss is not None:
    dis_loss.backward()
self.dis_optimizer.step()

if gen_loss is not None:
    gen_loss.backward()
self.gen_optimizer.step()

原因如下:当您调用 self.dis_optimizer.step() 时,您有效地就地修改了鉴别器的参数,与用于计算您试图反向传播的 gen_loss 的参数完全相同在。这是不可能的。

您必须计算 dis_loss 反向传播,更新鉴别器的权重,并清除梯度。只有这样,您才能使用新更新的鉴别器权重计算 gen_loss。最后,在生成器上反向传播。

这个 tutorial 是对典型 GAN 训练的一个很好的演练。

这可能不是您问题的确切答案,但我在尝试使用“自定义”分布式优化器时得到了这个答案,例如我在使用 Cherry 的优化器时不小心同时将模型移动到 DDP 模型。一旦我只根据 cherry 的工作方式将模型移动到设备,我就不再遇到这个问题。

上下文:https://github.com/learnables/learn2learn/issues/263

这对我有用。详情请见here.

def backward(self, gen_loss=None, dis_loss=None):
    if dis_loss is not None:
        dis_loss.backward(retain_graph=True)  # modified here
    self.dis_optimizer.step()

    if gen_loss is not None:
        gen_loss.backward()
    self.gen_optimizer.step()