为什么应该仅在 1 个元素张量上或对 Variable 的梯度 w.r.t 调用 backward 函数？

Why should be the function backward be called only on 1 element tensor or with gradients w.r.t to Variable?

我是pytorch的新手。我想了解为什么我们不能在包含大小为 [2,2] 的张量的变量上调用向后函数。如果我们确实想在包含大小为 [2,2] 的张量的变量上调用它，我们必须首先定义一个梯度张量，然后在包含张量的变量上调用向后函数 w.r.t 定义的梯度。

来自 autograd

上的教程

If you want to compute the derivatives, you can call .backward() on a Variable. If Variable is a scalar (i.e. it holds a one element data), you don’t need to specify any arguments to backward(), however if it has more elements, you need to specify a grad_output argument that is a tensor of matching shape.

基本上，要启动链式法则，您需要在输出端设置一个梯度，以使其运行。如果输出是一个标量损失函数（它通常是 - 通常你在损失变量处开始向后传递），它的隐含值为 1.0

来自教程：

let's backprop now out.backward() is equivalent to doing out.backward(torch.Tensor([1.0]))

但也许你只想更新一个子图（在网络深处的某个地方）......而 Variable 的值是一个权重矩阵。然后你必须告诉它从哪里开始。来自他们的一位首席开发者（链接中的某处）

是的，没错。我们只支持标量微分函数，所以如果你想从一个非标量值向后开始你需要提供 dout / dy

梯度参数

https://discuss.pytorch.org/t/how-the-backward-works-for-torch-variable/907/8 好的解释

很好的解释

http://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html教程

为什么应该仅在 1 个元素张量上或对 Variable 的梯度 w.r.t 调用 backward 函数？

Why should be the function backward be called only on 1 element tensor or with gradients w.r.t to Variable?

python

pytorch