如果用于梯度更新的索引叶变量如何解决就地操作错误?

How to get around in place operation error if index leaf variable for gradient update?

当我尝试索引叶变量以使用自定义收缩函数更新梯度时遇到就地操作错误。我无法解决它。非常感谢任何帮助!

import torch.nn as nn
import torch
import numpy as np
from torch.autograd import Variable, Function

# hyper parameters
batch_size = 100 # batch size of images
ld = 0.2 # sparse penalty
lr = 0.1 # learning rate

x = Variable(torch.from_numpy(np.random.normal(0,1,(batch_size,10,10))), requires_grad=False)  # original

# depends on size of the dictionary, number of atoms.
D = Variable(torch.from_numpy(np.random.normal(0,1,(500,10,10))), requires_grad=True)

# hx sparse representation
ht = Variable(torch.from_numpy(np.random.normal(0,1,(batch_size,500,1,1))), requires_grad=True)

# Dictionary loss function
loss = nn.MSELoss()

# customized shrink function to update gradient
shrink_ht = lambda x: torch.stack([torch.sign(i)*torch.max(torch.abs(i)-lr*ld,0)[0] for i in x])

### sparse reprsentation optimizer_ht single image.
optimizer_ht = torch.optim.SGD([ht], lr=lr, momentum=0.9) # optimizer for sparse representation

## update for the batch
for idx in range(len(x)):
    optimizer_ht.zero_grad() # clear up gradients
    loss_ht = 0.5*torch.norm((x[idx]-(D*ht[idx]).sum(dim=0)),p=2)**2
    loss_ht.backward() # back propogation and calculate gradients
    optimizer_ht.step() # update parameters with gradients
    ht[idx] = shrink_ht(ht[idx])  # customized shrink function.

RuntimeError Traceback (most recent call last) in ()
15 loss_ht.backward() # back propogation and calculate gradients
16 optimizer_ht.step() # update parameters with gradients
—> 17 ht[idx] = shrink_ht(ht[idx]) # customized shrink function.
18
19

/home/miniconda3/lib/python3.6/site-packages/torch/autograd/variable.py in setitem(self, key, value)
85 return MaskedFill.apply(self, key, value, True)
86 else:
—> 87 return SetItem.apply(self, key, value)
88
89 def deepcopy(self, memo):

RuntimeError: a leaf Variable that requires grad has been used in an in-place operation.

具体来说,下面这行代码似乎会出错,因为它同时索引和更新叶变量。

ht[idx] = shrink_ht(ht[idx])  # customized shrink function.

谢谢。

W.S.

问题出在 ht 需要毕业:

ht = Variable(torch.from_numpy(np.random.normal(0,1,(batch_size,500,1,1))), requires_grad=True)

对于需要梯度的变量,pytorch 不允许您为它们的(切片)赋值。你不能这样做:

ht[idx] = some_tensor

这意味着您需要找到另一种方法来使用内置的 pytorch 函数(如 squeezeunsqueeze 等)来执行自定义收缩函数。

另一种选择是将 shrink_ht(ht[idx]) 切片分配给另一个不需要梯度的变量或张量。

我刚发现:要更新变量,需要ht.data[idx]而不是ht[idx]。我们可以使用 .data 直接访问张量。

这里使用ht.data[idx]是可以的,但是新的约定是显式使用torch.no_grad(),比如:

with torch.no_grad(): 
    ht[idx] = shrink_ht(ht[idx])

请注意,此就地操作没有梯度。换句话说,梯度只返回到 htshrunk 值,而不是 ht.

unshrunk