如何转换 NN 的输出,同时仍然能够训练?
How to transform output of NN, while still being able to train?
我有一个输出 output
的神经网络。我想在损失和反向传播发生之前转换 output
。
这是我的通用代码:
with torch.set_grad_enabled(training):
outputs = net(x_batch[:, 0], x_batch[:, 1]) # the prediction of the NN
# My issue is here:
outputs = transform_torch(outputs)
loss = my_loss(outputs, y_batch)
if training:
scheduler.step()
loss.backward()
optimizer.step()
按照 中的建议,我有一个转换函数,我通过它输出:
def transform_torch(predictions):
new_tensor = []
for i in range(int(len(predictions))):
arr = predictions[i]
a = arr.clone().detach()
# My transformation, which results in a positive first element, and the other elements represent decrements of the first positive element.
b = torch.negative(a)
b[0] = abs(b[0])
new_tensor.append(torch.cumsum(b, dim = 0))
# new_tensor[i].requires_grad = True
new_tensor = torch.stack(new_tensor, 0)
return new_tensor
注意:除了clone().detach()
,我也尝试了中描述的方法,结果相似。
我的问题是这个转换后的张量实际上没有进行任何训练。
如果我尝试就地修改张量(例如直接修改 arr
),Torch 会抱怨我无法就地修改带有梯度的张量。
有什么建议吗?
用这样的东西从张量中提取梯度如何
grad = output.grad
并在转换后将相同的梯度分配给新的张量
在您的 predictions
上调用 detach
会停止向您的模型传播梯度。之后你做的任何事情都不能改变你的参数。
如何修改您的代码来避免这种情况:
def transform_torch(predictions):
b = torch.cat([predictions[:, :1, ...].abs(), -predictions[:, 1:, ...]], dim=1)
new_tensor = torch.cumsum(b, dim=1)
return new_tensor
您可以 运行 进行一个小测试,以验证梯度是否通过此转换传播:
# start with some random tensor representing the input predictions
# make sure it requires_grad
pred = torch.rand((4, 5, 2, 3)).requires_grad_(True)
# transform it
tpred = transform_torch(pred)
# make up some "default" loss function and back-prop
tpred.mean().backward()
# check to see all gradients of the original prediction:
pred.grad
# as you can see, all gradients are non-zero
Out[]:
tensor([[[[ 0.0417, 0.0417, 0.0417],
[ 0.0417, 0.0417, 0.0417]],
[[-0.0333, -0.0333, -0.0333],
[-0.0333, -0.0333, -0.0333]],
[[-0.0250, -0.0250, -0.0250],
[-0.0250, -0.0250, -0.0250]],
[[-0.0167, -0.0167, -0.0167],
[-0.0167, -0.0167, -0.0167]],
[[-0.0083, -0.0083, -0.0083],
[-0.0083, -0.0083, -0.0083]]],
[[[ 0.0417, 0.0417, 0.0417],
[ 0.0417, 0.0417, 0.0417]],
[[-0.0333, -0.0333, -0.0333],
[-0.0333, -0.0333, -0.0333]],
[[-0.0250, -0.0250, -0.0250],
[-0.0250, -0.0250, -0.0250]],
[[-0.0167, -0.0167, -0.0167],
[-0.0167, -0.0167, -0.0167]],
[[-0.0083, -0.0083, -0.0083],
[-0.0083, -0.0083, -0.0083]]],
[[[ 0.0417, 0.0417, 0.0417],
[ 0.0417, 0.0417, 0.0417]],
[[-0.0333, -0.0333, -0.0333],
[-0.0333, -0.0333, -0.0333]],
[[-0.0250, -0.0250, -0.0250],
[-0.0250, -0.0250, -0.0250]],
[[-0.0167, -0.0167, -0.0167],
[-0.0167, -0.0167, -0.0167]],
[[-0.0083, -0.0083, -0.0083],
[-0.0083, -0.0083, -0.0083]]],
[[[ 0.0417, 0.0417, 0.0417],
[ 0.0417, 0.0417, 0.0417]],
[[-0.0333, -0.0333, -0.0333],
[-0.0333, -0.0333, -0.0333]],
[[-0.0250, -0.0250, -0.0250],
[-0.0250, -0.0250, -0.0250]],
[[-0.0167, -0.0167, -0.0167],
[-0.0167, -0.0167, -0.0167]],
[[-0.0083, -0.0083, -0.0083],
[-0.0083, -0.0083, -0.0083]]]])
如果你用你的原始代码尝试这个小测试,你会得到一个错误,表明你试图通过不 require_grad
的张量传播,或者你不会得到输入 pred
.
我有一个输出 output
的神经网络。我想在损失和反向传播发生之前转换 output
。
这是我的通用代码:
with torch.set_grad_enabled(training):
outputs = net(x_batch[:, 0], x_batch[:, 1]) # the prediction of the NN
# My issue is here:
outputs = transform_torch(outputs)
loss = my_loss(outputs, y_batch)
if training:
scheduler.step()
loss.backward()
optimizer.step()
按照
def transform_torch(predictions):
new_tensor = []
for i in range(int(len(predictions))):
arr = predictions[i]
a = arr.clone().detach()
# My transformation, which results in a positive first element, and the other elements represent decrements of the first positive element.
b = torch.negative(a)
b[0] = abs(b[0])
new_tensor.append(torch.cumsum(b, dim = 0))
# new_tensor[i].requires_grad = True
new_tensor = torch.stack(new_tensor, 0)
return new_tensor
注意:除了clone().detach()
,我也尝试了
我的问题是这个转换后的张量实际上没有进行任何训练。
如果我尝试就地修改张量(例如直接修改 arr
),Torch 会抱怨我无法就地修改带有梯度的张量。
有什么建议吗?
用这样的东西从张量中提取梯度如何
grad = output.grad
并在转换后将相同的梯度分配给新的张量
在您的 predictions
上调用 detach
会停止向您的模型传播梯度。之后你做的任何事情都不能改变你的参数。
如何修改您的代码来避免这种情况:
def transform_torch(predictions):
b = torch.cat([predictions[:, :1, ...].abs(), -predictions[:, 1:, ...]], dim=1)
new_tensor = torch.cumsum(b, dim=1)
return new_tensor
您可以 运行 进行一个小测试,以验证梯度是否通过此转换传播:
# start with some random tensor representing the input predictions
# make sure it requires_grad
pred = torch.rand((4, 5, 2, 3)).requires_grad_(True)
# transform it
tpred = transform_torch(pred)
# make up some "default" loss function and back-prop
tpred.mean().backward()
# check to see all gradients of the original prediction:
pred.grad
# as you can see, all gradients are non-zero
Out[]:
tensor([[[[ 0.0417, 0.0417, 0.0417],
[ 0.0417, 0.0417, 0.0417]],
[[-0.0333, -0.0333, -0.0333],
[-0.0333, -0.0333, -0.0333]],
[[-0.0250, -0.0250, -0.0250],
[-0.0250, -0.0250, -0.0250]],
[[-0.0167, -0.0167, -0.0167],
[-0.0167, -0.0167, -0.0167]],
[[-0.0083, -0.0083, -0.0083],
[-0.0083, -0.0083, -0.0083]]],
[[[ 0.0417, 0.0417, 0.0417],
[ 0.0417, 0.0417, 0.0417]],
[[-0.0333, -0.0333, -0.0333],
[-0.0333, -0.0333, -0.0333]],
[[-0.0250, -0.0250, -0.0250],
[-0.0250, -0.0250, -0.0250]],
[[-0.0167, -0.0167, -0.0167],
[-0.0167, -0.0167, -0.0167]],
[[-0.0083, -0.0083, -0.0083],
[-0.0083, -0.0083, -0.0083]]],
[[[ 0.0417, 0.0417, 0.0417],
[ 0.0417, 0.0417, 0.0417]],
[[-0.0333, -0.0333, -0.0333],
[-0.0333, -0.0333, -0.0333]],
[[-0.0250, -0.0250, -0.0250],
[-0.0250, -0.0250, -0.0250]],
[[-0.0167, -0.0167, -0.0167],
[-0.0167, -0.0167, -0.0167]],
[[-0.0083, -0.0083, -0.0083],
[-0.0083, -0.0083, -0.0083]]],
[[[ 0.0417, 0.0417, 0.0417],
[ 0.0417, 0.0417, 0.0417]],
[[-0.0333, -0.0333, -0.0333],
[-0.0333, -0.0333, -0.0333]],
[[-0.0250, -0.0250, -0.0250],
[-0.0250, -0.0250, -0.0250]],
[[-0.0167, -0.0167, -0.0167],
[-0.0167, -0.0167, -0.0167]],
[[-0.0083, -0.0083, -0.0083],
[-0.0083, -0.0083, -0.0083]]]])
如果你用你的原始代码尝试这个小测试,你会得到一个错误,表明你试图通过不 require_grad
的张量传播,或者你不会得到输入 pred
.