为什么 autograd 不为中间变量产生梯度？

Question

尝试了解渐变的表示方式以及 autograd 的工作方式：

import torch
from torch.autograd import Variable

x = Variable(torch.Tensor([2]), requires_grad=True)
y = x * x
z = y * y

z.backward()

print(x.grad)
#Variable containing:
#32
#[torch.FloatTensor of size 1]

print(y.grad)
#None

为什么它不为 y 生成渐变？如果 y.grad = dz/dy，那么它不应该至少产生一个像 y.grad = 2*y 这样的变量吗？

Answer 1

By default, gradients are only retained for leaf variables. non-leaf variables' gradients are not retained to be inspected later. This was done by design, to save memory.

-soumith chintala

参见：https://discuss.pytorch.org/t/why-cant-i-see-grad-of-an-intermediate-variable/94

选项 1：

致电y.retain_grad()

x = Variable(torch.Tensor([2]), requires_grad=True)
y = x * x
z = y * y

y.retain_grad()

z.backward()

print(y.grad)
#Variable containing:
# 8
#[torch.FloatTensor of size 1]

来源：https://discuss.pytorch.org/t/why-cant-i-see-grad-of-an-intermediate-variable/94/16

选项 2：

注册一个hook，它基本上是一个在计算梯度时调用的函数。然后你可以保存它，分配它，打印它，随便什么......

from __future__ import print_function
import torch
from torch.autograd import Variable

x = Variable(torch.Tensor([2]), requires_grad=True)
y = x * x
z = y * y

y.register_hook(print) ## this can be anything you need it to be

z.backward()

输出：

Variable containing:  8 [torch.FloatTensor of size 1

来源：https://discuss.pytorch.org/t/why-cant-i-see-grad-of-an-intermediate-variable/94/2

另见：https://discuss.pytorch.org/t/why-cant-i-see-grad-of-an-intermediate-variable/94/7

为什么 autograd 不为中间变量产生梯度？

Why does autograd not produce gradient for intermediate variables?

pytorch

autograd

选项 1：

选项 2：