为什么我们在定义 ReLU autograd 函数时需要克隆 grad_output 并将其分配给 grad_input？

Question

我正在浏览 pytorch 教程的 autograd 部分。我有两个问题：

为什么我们需要克隆 grad_output 并将其分配给 grad_input 而不是在反向传播期间进行简单分配？
grad_input[input < 0] = 0 的目的是什么？这是否意味着当输入小于零时我们不更新梯度？

代码如下：

class MyReLU(torch.autograd.Function):
    @staticmethod
    def forward(ctx, input):
        """
        In the forward pass we receive a Tensor containing the input and return
        a Tensor containing the output. ctx is a context object that can be used
        to stash information for backward computation. You can cache arbitrary
        objects for use in the backward pass using the ctx.save_for_backward method.
        """
        ctx.save_for_backward(input)
        return input.clamp(min=0)

    @staticmethod
    def backward(ctx, grad_output):
        """
        In the backward pass we receive a Tensor containing the gradient of the loss
        with respect to the output, and we need to compute the gradient of the loss
        with respect to the input.
        """
        input, = ctx.saved_tensors
        grad_input = grad_output.clone()
        grad_input[input < 0] = 0
        return grad_input

Link 这里： https://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-defining-new-autograd-functions

非常感谢。

Answer 1

Why do we need clone the grad_output and assign it to grad_input other than simple assignment during backpropagation?

tensor.clone() 创建模仿原始张量的 requires_grad 场的张量副本。 clone 是一种复制张量的方法，同时仍将副本保留为它来自的计算图的一部分。

所以，grad_input 是与 grad_output 相同的计算图的一部分，如果我们计算 grad_output 的梯度，那么 grad_input 也会这样做.

由于我们在 grad_input 中进行了更改，所以我们首先克隆它。

What's the purpose of 'grad_input[input < 0] = 0'? Does it mean we don't update the gradient when input less than zero?

这是根据 ReLU 函数的定义完成的。 ReLU函数为f(x)=max(0,x)。这意味着如果 x<=0 则 f(x)=0，否则 f(x)=x。第一种情况，当x<0时，f(x)对x的导数是f'(x)=0。所以，我们执行grad_input[input < 0] = 0。在第二种情况下，它是 f'(x)=1，所以我们只需将 grad_output 传递给 grad_input（就像一个打开的门）。

为什么我们在定义 ReLU autograd 函数时需要克隆 grad_output 并将其分配给 grad_input？

Why do we need clone the grad_output and assign it to grad_input when defining a ReLU autograd function?

backpropagation

pytorch

autograd