不能向后传递分类变压器模型中的两个损失
Can't backward pass two losses in Classification Transformer Model
对于我的模型,我使用了 roberta transformer 模型和 Huggingface transformer 库中的 Trainer。
我计算了两个损失:
lloss
是交叉熵损失,dloss
计算层级之间的损失。
总损失是lloss和dloss之和。 (基于 this)
然而,当调用 total_loss.backwards()
时,出现错误:
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed
知道为什么会这样吗?我可以强制它只向后调用一次吗?下面是损失计算部分:
dloss = calculate_dloss(prediction, labels, 3)
lloss = calculate_lloss(predeiction, labels, 3)
total_loss = lloss + dloss
total_loss.backward()
def calculate_lloss(predictions, true_labels, total_level):
'''Calculates the layer loss.
'''
loss_fct = nn.CrossEntropyLoss()
lloss = 0
for l in range(total_level):
lloss += loss_fct(predictions[l], true_labels[l])
return self.alpha * lloss
def calculate_dloss(predictions, true_labels, total_level):
'''Calculate the dependence loss.
'''
dloss = 0
for l in range(1, total_level):
current_lvl_pred = torch.argmax(nn.Softmax(dim=1)(predictions[l]), dim=1)
prev_lvl_pred = torch.argmax(nn.Softmax(dim=1)(predictions[l-1]), dim=1)
D_l = self.check_hierarchy(current_lvl_pred, prev_lvl_pred, l) #just a boolean tensor
l_prev = torch.where(prev_lvl_pred == true_labels[l-1], torch.FloatTensor([0]).to(self.device), torch.FloatTensor([1]).to(self.device))
l_curr = torch.where(current_lvl_pred == true_labels[l], torch.FloatTensor([0]).to(self.device), torch.FloatTensor([1]).to(self.device))
dloss += torch.sum(torch.pow(self.p_loss, D_l*l_prev)*torch.pow(self.p_loss, D_l*l_curr) - 1)
return self.beta * dloss
损失是两个单独损失的总和并没有错,这里有一个小的原理证明改编from the docs:
import torch
import numpy
from sklearn.datasets import make_blobs
class Feedforward(torch.nn.Module):
def __init__(self, input_size, hidden_size):
super(Feedforward, self).__init__()
self.input_size = input_size
self.hidden_size = hidden_size
self.fc1 = torch.nn.Linear(self.input_size, self.hidden_size)
self.relu = torch.nn.ReLU()
self.fc2 = torch.nn.Linear(self.hidden_size, 1)
self.sigmoid = torch.nn.Sigmoid()
def forward(self, x):
hidden = self.fc1(x)
relu = self.relu(hidden)
output = self.fc2(relu)
output = self.sigmoid(output)
return output
def blob_label(y, label, loc): # assign labels
target = numpy.copy(y)
for l in loc:
target[y == l] = label
return target
x_train, y_train = make_blobs(n_samples=40, n_features=2, cluster_std=1.5, shuffle=True)
x_train = torch.FloatTensor(x_train)
y_train = torch.FloatTensor(blob_label(y_train, 0, [0]))
y_train = torch.FloatTensor(blob_label(y_train, 1, [1,2,3]))
x_test, y_test = make_blobs(n_samples=10, n_features=2, cluster_std=1.5, shuffle=True)
x_test = torch.FloatTensor(x_test)
y_test = torch.FloatTensor(blob_label(y_test, 0, [0]))
y_test = torch.FloatTensor(blob_label(y_test, 1, [1,2,3]))
model = Feedforward(2, 10)
criterion = torch.nn.BCELoss()
optimizer = torch.optim.SGD(model.parameters(), lr = 0.01)
model.eval()
y_pred = model(x_test)
before_train = criterion(y_pred.squeeze(), y_test)
print('Test loss before training' , before_train.item())
model.train()
epoch = 20
for epoch in range(epoch):
optimizer.zero_grad() # Forward pass
y_pred = model(x_train) # Compute Loss
lossCE= criterion(y_pred.squeeze(), y_train)
lossSQD = (y_pred.squeeze()-y_train).pow(2).mean()
loss=lossCE+lossSQD
print('Epoch {}: train loss: {}'.format(epoch, loss.item())) # Backward pass
loss.backward()
optimizer.step()
必须有第二次直接或间接调用 backward
某些变量,然后遍历您的图形。在这里要求完整的代码有点过分,只有你可以检查这个或者至少将它简化为一个最小的例子(这样做时,你可能已经发现了问题)。除此之外,我会开始检查:
- 它是否已经出现在第一次迭代训练中?如果不是:您是否在没有
detach
的情况下为第二次迭代重用任何计算结果?
- 当你分别对你的损失进行
backward
时 lloss.backward()
然后 dloss.backward()
(这与首先将它们加在一起的效果相同,因为梯度是累积的):会发生什么?这将让您追踪错误发生的两个损失中的哪一个。
在 backward() 之后你的补偿。图被释放,因此对于第二个向后,您需要通过再次提供输入来创建一个新图。如果你想在向后重复相同的图形(出于某种原因),你需要将向后的 retain_graph 标志指定为 True。见 retain_graph here.
P.S。由于张量的求和是自动可微的,因此求和损失不会导致向后的任何问题。
对于我的模型,我使用了 roberta transformer 模型和 Huggingface transformer 库中的 Trainer。
我计算了两个损失:
lloss
是交叉熵损失,dloss
计算层级之间的损失。
总损失是lloss和dloss之和。 (基于 this)
然而,当调用 total_loss.backwards()
时,出现错误:
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed
知道为什么会这样吗?我可以强制它只向后调用一次吗?下面是损失计算部分:
dloss = calculate_dloss(prediction, labels, 3)
lloss = calculate_lloss(predeiction, labels, 3)
total_loss = lloss + dloss
total_loss.backward()
def calculate_lloss(predictions, true_labels, total_level):
'''Calculates the layer loss.
'''
loss_fct = nn.CrossEntropyLoss()
lloss = 0
for l in range(total_level):
lloss += loss_fct(predictions[l], true_labels[l])
return self.alpha * lloss
def calculate_dloss(predictions, true_labels, total_level):
'''Calculate the dependence loss.
'''
dloss = 0
for l in range(1, total_level):
current_lvl_pred = torch.argmax(nn.Softmax(dim=1)(predictions[l]), dim=1)
prev_lvl_pred = torch.argmax(nn.Softmax(dim=1)(predictions[l-1]), dim=1)
D_l = self.check_hierarchy(current_lvl_pred, prev_lvl_pred, l) #just a boolean tensor
l_prev = torch.where(prev_lvl_pred == true_labels[l-1], torch.FloatTensor([0]).to(self.device), torch.FloatTensor([1]).to(self.device))
l_curr = torch.where(current_lvl_pred == true_labels[l], torch.FloatTensor([0]).to(self.device), torch.FloatTensor([1]).to(self.device))
dloss += torch.sum(torch.pow(self.p_loss, D_l*l_prev)*torch.pow(self.p_loss, D_l*l_curr) - 1)
return self.beta * dloss
损失是两个单独损失的总和并没有错,这里有一个小的原理证明改编from the docs:
import torch
import numpy
from sklearn.datasets import make_blobs
class Feedforward(torch.nn.Module):
def __init__(self, input_size, hidden_size):
super(Feedforward, self).__init__()
self.input_size = input_size
self.hidden_size = hidden_size
self.fc1 = torch.nn.Linear(self.input_size, self.hidden_size)
self.relu = torch.nn.ReLU()
self.fc2 = torch.nn.Linear(self.hidden_size, 1)
self.sigmoid = torch.nn.Sigmoid()
def forward(self, x):
hidden = self.fc1(x)
relu = self.relu(hidden)
output = self.fc2(relu)
output = self.sigmoid(output)
return output
def blob_label(y, label, loc): # assign labels
target = numpy.copy(y)
for l in loc:
target[y == l] = label
return target
x_train, y_train = make_blobs(n_samples=40, n_features=2, cluster_std=1.5, shuffle=True)
x_train = torch.FloatTensor(x_train)
y_train = torch.FloatTensor(blob_label(y_train, 0, [0]))
y_train = torch.FloatTensor(blob_label(y_train, 1, [1,2,3]))
x_test, y_test = make_blobs(n_samples=10, n_features=2, cluster_std=1.5, shuffle=True)
x_test = torch.FloatTensor(x_test)
y_test = torch.FloatTensor(blob_label(y_test, 0, [0]))
y_test = torch.FloatTensor(blob_label(y_test, 1, [1,2,3]))
model = Feedforward(2, 10)
criterion = torch.nn.BCELoss()
optimizer = torch.optim.SGD(model.parameters(), lr = 0.01)
model.eval()
y_pred = model(x_test)
before_train = criterion(y_pred.squeeze(), y_test)
print('Test loss before training' , before_train.item())
model.train()
epoch = 20
for epoch in range(epoch):
optimizer.zero_grad() # Forward pass
y_pred = model(x_train) # Compute Loss
lossCE= criterion(y_pred.squeeze(), y_train)
lossSQD = (y_pred.squeeze()-y_train).pow(2).mean()
loss=lossCE+lossSQD
print('Epoch {}: train loss: {}'.format(epoch, loss.item())) # Backward pass
loss.backward()
optimizer.step()
必须有第二次直接或间接调用 backward
某些变量,然后遍历您的图形。在这里要求完整的代码有点过分,只有你可以检查这个或者至少将它简化为一个最小的例子(这样做时,你可能已经发现了问题)。除此之外,我会开始检查:
- 它是否已经出现在第一次迭代训练中?如果不是:您是否在没有
detach
的情况下为第二次迭代重用任何计算结果? - 当你分别对你的损失进行
backward
时lloss.backward()
然后dloss.backward()
(这与首先将它们加在一起的效果相同,因为梯度是累积的):会发生什么?这将让您追踪错误发生的两个损失中的哪一个。
在 backward() 之后你的补偿。图被释放,因此对于第二个向后,您需要通过再次提供输入来创建一个新图。如果你想在向后重复相同的图形(出于某种原因),你需要将向后的 retain_graph 标志指定为 True。见 retain_graph here.
P.S。由于张量的求和是自动可微的,因此求和损失不会导致向后的任何问题。