如何包装 PyTorch 函数并实现 autograd?
How to wrap PyTorch functions and implement autograd?
我正在学习 Defining new autograd functions. The autograd function I want to implement is a wrapper around torch.nn.functional.max_pool1d
上的 PyTorch 教程。这是我目前所拥有的:
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.autograd as tag
class SquareAndMaxPool1d(tag.Function):
@staticmethod
def forward(ctx, input, kernel_size, stride=None, padding=0, dilation=1, \
return_indices=False, ceil_mode=False):
ctx.save_for_backward( input )
inputC = input.clone() #copy input
inputC *= inputC
output = F.max_pool1d(inputC, kernel_size, stride=stride, \
padding=padding, dilation=dilation, \
return_indices=return_indices, \
ceil_mode=ceil_mode)
return output
@staticmethod
def backward(ctx, grad_output):
input, = ctx.saved_tensors
grad_input = get_max_pool1d_grad_somehow(grad_output)
return 2.0*input*grad_input
我的问题是:如何获得包装函数的梯度?我知道可能还有其他方法可以做到这一点,因为我给出的例子很简单,但我想做的适合这个框架,需要我实现一个 autograd
函数。
编辑:在检查 this blog post 之后,我决定为 backward
尝试以下操作:
def backward(ctx, grad_output):
input, output = ctx.saved_tensors
grad_input = output.backward(grad_output)
return 2.0*input*grad_input
with output
添加到保存的变量。然后我运行下面的代码:
x = np.random.randn(1,1,5)
xT = torch.from_numpy(x)
xT.requires_grad=True
f = SquareAndMaxPool1d.apply
s = torch.sum(f(xT,2))
s.backward()
我得到 Bus error: 10
。
说,xT
是 tensor([[[ 1.69533562, -0.21779421, 2.28693953, -0.86688095, -1.01033497]]], dtype=torch.float64)
,那么我希望在调用 s.backward()
之后发现 xT.grad
是 tensor([[[ 3.39067124, -0. , 9.14775812, -0. , -2.02066994]]], dtype=torch.float64)
(即 2*x*grad_of_max_pool
, grad_of_max_pool
包含 tensor([[[1., 0., 2., 0., 1.]]], dtype=torch.float64)
).
我明白了为什么我会得到 Bus error: 10
。上面的代码似乎导致在 grad_input = output.backward(grad_output)
处递归调用我的 backward
。所以我需要找到一些其他方法来获得 max_pool1d
的梯度。我知道如何在纯 Python 中实现它,但结果会比我可以包装库代码慢得多。
你选了一个比较不幸的例子。 torch.nn.functional.max_pool1d
不是 torch.autograd.Function
的实例,因为它是 PyTorch 内置的,在 C++ 代码中定义并具有 autogenerated Python 绑定。我不确定是否可以通过其界面获取 backward
属性。
首先,如果你没有注意到,你不需要为这个公式的反向传播编写任何自定义代码,因为幂运算和max_pool1d
都已经定义了它,所以它们的组成也是由 autograd 覆盖。假设您的目标是锻炼,我建议您更多地手动完成(不要退回到 max_pool1d
的 backward
)。下面是一个例子
import torch
import torch.nn.functional as F
import torch.autograd as tag
class SquareAndMaxPool1d(tag.Function):
@staticmethod
def forward(ctx, input, kernel_size, **kwargs):
# we're gonna need indices for backward. Currently SquareAnd...
# never actually returns indices, I left it out for simplicity
kwargs['return_indices'] = True
input_sqr = input ** 2
output, indices = F.max_pool1d(input_sqr, kernel_size, **kwargs)
ctx.save_for_backward(input, indices)
return output
@staticmethod
def backward(ctx, grad_output):
input, indices = ctx.saved_tensors
# first we need to reconstruct the gradient of `max_pool1d`
# by putting all the output gradient elements (corresponding to
# input elements which made it through the max_pool1d) in their
# respective places, the rest has gradient of 0. We do it by
# scattering it against a tensor of 0s
grad_output_unpooled = torch.zeros_like(input)
grad_output_unpooled.scatter_(2, indices, grad_output)
# then incorporate the gradient of the "square" part of your
# operator
grad_input = 2. * input * grad_output_unpooled
# the docs for backward
# https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function.backward
# say that "it should return as many tensors, as there were inputs
# to forward()". It fails to mention that if an argument was not a
# tensor, it should return None (I remember reading this somewhere,
# but can't find it anymore). Anyway, we need to
# return a (grad_input, None) tuple to avoid a complaint that two
# outputs were expected
return grad_input, None
然后我们可以使用 numerical gradient checker 来验证操作是否按预期进行。
f = SquareAndMaxPool1d.apply
xT = torch.randn(1, 1, 6, requires_grad=True, dtype=torch.float64)
tag.gradcheck(lambda t: f(t, 2), xT)
很抱歉,如果这没有解决您关于如何获得 max_pool1d
的 backward
的问题,但希望您觉得我的回答足够有用。
你在递归调用中遇到的问题实际上来自 output
并且默认情况下 with no_grad
是默认行为,它似乎在 class 声明中继承自torch.autograd.Function
。如果你在forward
中勾选output.grad_fn
,它可能会是None
,而在backward
中,它可能会link到函数对象<SquareAndMaxPool1d...>
从而导致递归调用。如果您仍然对如何完全按照您的要求执行操作感兴趣,这里有一个 F.linear
:
的示例
import torch
import torch.nn.functional as F
class custom_Linear(nn.Linear):
def forward(self, _input):
return Custom_Linear_AGfn_getAround.apply(_input, self.weight, self.bias)
class Custom_Linear_AGfn_getAround(torch.autograd.Function):
@staticmethod
def forward(ctx, _input, _weight, _bias):
print('Custom forward')
with torch.enable_grad():
detached_input = _input.detach()
detached_input.requires_grad_(True)
detached_weight = _weight.detach()
detached_weight.requires_grad_(True)
detached_bias = _bias.detach()
detached_bias.requires_grad_(True)
_tmp = F.linear(detached_input, detached_weight, detached_bias)
ctx.saved_input = detached_input
ctx.saved_param = detached_weight, detached_bias
ctx.save_for_backward(_tmp)
_output = _tmp.detach()
return _output
@staticmethod
def backward(ctx, grad_out):
print('Custom backward')
_tmp, = ctx.saved_tensors
_weight, _bias = ctx.saved_param
detached_input = ctx.saved_input
with torch.enable_grad():
_tmp.backward(grad_out)
return detached_input.grad, _weight.grad, _bias.grad
基本上就是在不弄乱主图的情况下,为感兴趣的部分构造一个孤立的小图,并在查看时使用grad_fn
和requires_grad
来跟踪图分离什么以及孤立图需要什么。
关于棘手的部分:
- 分离权重和偏差:你可以不用,但你可以通过
_weight
和 _bias
通过 save_for_backward
并且将有 _weight.grad
、_bias.grad
as None
inside backward
BUT once outside _weight.grad
, _bias.grad
将具有正确的值,或者你通过属性传递它们,比如 ctx.saved_param
, in在这种情况下,您必须手动为 backward
的最后两个 returned 值(return detached_input.grad, None, None
)输入 None
,否则您将获得两倍的之后检查向后之外的权重和偏差梯度时的正确值。
- 如开头所述,
backward
和 forward
用于继承 torch.autograd.Function
的 class 似乎默认具有 with no_grad
行为。因此,在上面的代码中删除 with torch.enable_grad():
将导致 _tmp.grad_fn
成为 None
(无法理解为什么默认情况下 _tmp
有 grad_fn
到 None
requires_grad
到 forward
中的 False
尽管需要 detached_input
的渐变,直到我碰到:https://github.com/pytorch/pytorch/issues/7698)
- 我相信,但我没有检查如果你不分离它,你可能会为
_output
得到双倍的 grad_fn
,就像我没有 with torch.enable_grad()
那样不分离输出,导致 _tmp.grad_fn
向前 None,它确实在 backward
中获取 <Custom_Linear_AGfn_getAround...>
grad_fn
(并导致无限递归调用) .
我正在学习 Defining new autograd functions. The autograd function I want to implement is a wrapper around torch.nn.functional.max_pool1d
上的 PyTorch 教程。这是我目前所拥有的:
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.autograd as tag
class SquareAndMaxPool1d(tag.Function):
@staticmethod
def forward(ctx, input, kernel_size, stride=None, padding=0, dilation=1, \
return_indices=False, ceil_mode=False):
ctx.save_for_backward( input )
inputC = input.clone() #copy input
inputC *= inputC
output = F.max_pool1d(inputC, kernel_size, stride=stride, \
padding=padding, dilation=dilation, \
return_indices=return_indices, \
ceil_mode=ceil_mode)
return output
@staticmethod
def backward(ctx, grad_output):
input, = ctx.saved_tensors
grad_input = get_max_pool1d_grad_somehow(grad_output)
return 2.0*input*grad_input
我的问题是:如何获得包装函数的梯度?我知道可能还有其他方法可以做到这一点,因为我给出的例子很简单,但我想做的适合这个框架,需要我实现一个 autograd
函数。
编辑:在检查 this blog post 之后,我决定为 backward
尝试以下操作:
def backward(ctx, grad_output):
input, output = ctx.saved_tensors
grad_input = output.backward(grad_output)
return 2.0*input*grad_input
with output
添加到保存的变量。然后我运行下面的代码:
x = np.random.randn(1,1,5)
xT = torch.from_numpy(x)
xT.requires_grad=True
f = SquareAndMaxPool1d.apply
s = torch.sum(f(xT,2))
s.backward()
我得到 Bus error: 10
。
说,xT
是 tensor([[[ 1.69533562, -0.21779421, 2.28693953, -0.86688095, -1.01033497]]], dtype=torch.float64)
,那么我希望在调用 s.backward()
之后发现 xT.grad
是 tensor([[[ 3.39067124, -0. , 9.14775812, -0. , -2.02066994]]], dtype=torch.float64)
(即 2*x*grad_of_max_pool
, grad_of_max_pool
包含 tensor([[[1., 0., 2., 0., 1.]]], dtype=torch.float64)
).
我明白了为什么我会得到 Bus error: 10
。上面的代码似乎导致在 grad_input = output.backward(grad_output)
处递归调用我的 backward
。所以我需要找到一些其他方法来获得 max_pool1d
的梯度。我知道如何在纯 Python 中实现它,但结果会比我可以包装库代码慢得多。
你选了一个比较不幸的例子。 torch.nn.functional.max_pool1d
不是 torch.autograd.Function
的实例,因为它是 PyTorch 内置的,在 C++ 代码中定义并具有 autogenerated Python 绑定。我不确定是否可以通过其界面获取 backward
属性。
首先,如果你没有注意到,你不需要为这个公式的反向传播编写任何自定义代码,因为幂运算和max_pool1d
都已经定义了它,所以它们的组成也是由 autograd 覆盖。假设您的目标是锻炼,我建议您更多地手动完成(不要退回到 max_pool1d
的 backward
)。下面是一个例子
import torch
import torch.nn.functional as F
import torch.autograd as tag
class SquareAndMaxPool1d(tag.Function):
@staticmethod
def forward(ctx, input, kernel_size, **kwargs):
# we're gonna need indices for backward. Currently SquareAnd...
# never actually returns indices, I left it out for simplicity
kwargs['return_indices'] = True
input_sqr = input ** 2
output, indices = F.max_pool1d(input_sqr, kernel_size, **kwargs)
ctx.save_for_backward(input, indices)
return output
@staticmethod
def backward(ctx, grad_output):
input, indices = ctx.saved_tensors
# first we need to reconstruct the gradient of `max_pool1d`
# by putting all the output gradient elements (corresponding to
# input elements which made it through the max_pool1d) in their
# respective places, the rest has gradient of 0. We do it by
# scattering it against a tensor of 0s
grad_output_unpooled = torch.zeros_like(input)
grad_output_unpooled.scatter_(2, indices, grad_output)
# then incorporate the gradient of the "square" part of your
# operator
grad_input = 2. * input * grad_output_unpooled
# the docs for backward
# https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function.backward
# say that "it should return as many tensors, as there were inputs
# to forward()". It fails to mention that if an argument was not a
# tensor, it should return None (I remember reading this somewhere,
# but can't find it anymore). Anyway, we need to
# return a (grad_input, None) tuple to avoid a complaint that two
# outputs were expected
return grad_input, None
然后我们可以使用 numerical gradient checker 来验证操作是否按预期进行。
f = SquareAndMaxPool1d.apply
xT = torch.randn(1, 1, 6, requires_grad=True, dtype=torch.float64)
tag.gradcheck(lambda t: f(t, 2), xT)
很抱歉,如果这没有解决您关于如何获得 max_pool1d
的 backward
的问题,但希望您觉得我的回答足够有用。
你在递归调用中遇到的问题实际上来自 output
并且默认情况下 with no_grad
是默认行为,它似乎在 class 声明中继承自torch.autograd.Function
。如果你在forward
中勾选output.grad_fn
,它可能会是None
,而在backward
中,它可能会link到函数对象<SquareAndMaxPool1d...>
从而导致递归调用。如果您仍然对如何完全按照您的要求执行操作感兴趣,这里有一个 F.linear
:
import torch
import torch.nn.functional as F
class custom_Linear(nn.Linear):
def forward(self, _input):
return Custom_Linear_AGfn_getAround.apply(_input, self.weight, self.bias)
class Custom_Linear_AGfn_getAround(torch.autograd.Function):
@staticmethod
def forward(ctx, _input, _weight, _bias):
print('Custom forward')
with torch.enable_grad():
detached_input = _input.detach()
detached_input.requires_grad_(True)
detached_weight = _weight.detach()
detached_weight.requires_grad_(True)
detached_bias = _bias.detach()
detached_bias.requires_grad_(True)
_tmp = F.linear(detached_input, detached_weight, detached_bias)
ctx.saved_input = detached_input
ctx.saved_param = detached_weight, detached_bias
ctx.save_for_backward(_tmp)
_output = _tmp.detach()
return _output
@staticmethod
def backward(ctx, grad_out):
print('Custom backward')
_tmp, = ctx.saved_tensors
_weight, _bias = ctx.saved_param
detached_input = ctx.saved_input
with torch.enable_grad():
_tmp.backward(grad_out)
return detached_input.grad, _weight.grad, _bias.grad
基本上就是在不弄乱主图的情况下,为感兴趣的部分构造一个孤立的小图,并在查看时使用grad_fn
和requires_grad
来跟踪图分离什么以及孤立图需要什么。
关于棘手的部分:
- 分离权重和偏差:你可以不用,但你可以通过
_weight
和_bias
通过save_for_backward
并且将有_weight.grad
、_bias.grad
asNone
insidebackward
BUT once outside_weight.grad
,_bias.grad
将具有正确的值,或者你通过属性传递它们,比如ctx.saved_param
, in在这种情况下,您必须手动为backward
的最后两个 returned 值(returndetached_input.grad, None, None
)输入None
,否则您将获得两倍的之后检查向后之外的权重和偏差梯度时的正确值。 - 如开头所述,
backward
和forward
用于继承torch.autograd.Function
的 class 似乎默认具有with no_grad
行为。因此,在上面的代码中删除with torch.enable_grad():
将导致_tmp.grad_fn
成为None
(无法理解为什么默认情况下_tmp
有grad_fn
到None
requires_grad
到forward
中的False
尽管需要detached_input
的渐变,直到我碰到:https://github.com/pytorch/pytorch/issues/7698) - 我相信,但我没有检查如果你不分离它,你可能会为
_output
得到双倍的grad_fn
,就像我没有with torch.enable_grad()
那样不分离输出,导致_tmp.grad_fn
向前 None,它确实在backward
中获取<Custom_Linear_AGfn_getAround...>
grad_fn
(并导致无限递归调用) .