计算模型中的两个损失并反向传播两次
calculate two losses in a model and backpropagate twice
我正在使用 BertModel 创建一个模型来识别答案跨度(不使用 BertForQA)。
我有一个独立的线性层来分别确定开始和结束标记。在 init():
self.start_linear = nn.Linear(h, output_dim)
self.end_linear = nn.Linear(h, output_dim)
在forward()中,我输出了一个预测的起始层和预测的结束层:
def forward(self, input_ids, attention_mask):
outputs = self.bert(input_ids, attention_mask) # input = bert tokenizer encoding
lhs = outputs.last_hidden_state # (batch_size, sequence_length, hidden_size)
out = lhs[:, -1, :] # (batch_size, hidden_dim)
st = self.start_linear(out)
end = self.end_linear(out)
predict_start = self.softmax(st)
predict_end = self.softmax(end)
return predict_start, predict_end
然后在train_epoch()中,我尝试分别反向传播损失:
def train_epoch(model, train_loader, optimizer):
model.train()
total = 0
st_loss, st_correct, st_total_loss = 0, 0, 0
end_loss, end_correct, end_total_loss = 0, 0, 0
for batch in train_loader:
optimizer.zero_grad()
input_ids = batch['input_ids'].to(device)
attention_mask = batch['attention_mask'].to(device)
start_idx = batch['start'].to(device)
end_idx = batch['end'].to(device)
start, end = model(input_ids=input_ids, attention_mask=attention_mask)
st_loss = model.compute_loss(start, start_idx)
end_loss = model.compute_loss(end, end_idx)
st_total_loss += st_loss.item()
end_total_loss += end_loss.item()
# perform backward propagation to compute the gradients
st_loss.backward()
end_loss.backward()
# update the weights
optimizer.step()
但后来我上线了 end_loss.backward()
:
Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time.
我应该单独做反向传球吗?或者我应该用另一种方式来做吗?谢谢!
标准程序只是对损失求和并在总和上进行反向传播。
确保您要求和的两个损失的值平均大约一样大,或者至少与您希望每个损失相对于彼此的重要性成正比(否则,模型是将针对较大的损失而不是较小的损失进行优化)。在跨度检测案例中,我猜这不是必需的,但是由于问题的明显对称性。
我正在使用 BertModel 创建一个模型来识别答案跨度(不使用 BertForQA)。
我有一个独立的线性层来分别确定开始和结束标记。在 init():
self.start_linear = nn.Linear(h, output_dim)
self.end_linear = nn.Linear(h, output_dim)
在forward()中,我输出了一个预测的起始层和预测的结束层:
def forward(self, input_ids, attention_mask):
outputs = self.bert(input_ids, attention_mask) # input = bert tokenizer encoding
lhs = outputs.last_hidden_state # (batch_size, sequence_length, hidden_size)
out = lhs[:, -1, :] # (batch_size, hidden_dim)
st = self.start_linear(out)
end = self.end_linear(out)
predict_start = self.softmax(st)
predict_end = self.softmax(end)
return predict_start, predict_end
然后在train_epoch()中,我尝试分别反向传播损失:
def train_epoch(model, train_loader, optimizer):
model.train()
total = 0
st_loss, st_correct, st_total_loss = 0, 0, 0
end_loss, end_correct, end_total_loss = 0, 0, 0
for batch in train_loader:
optimizer.zero_grad()
input_ids = batch['input_ids'].to(device)
attention_mask = batch['attention_mask'].to(device)
start_idx = batch['start'].to(device)
end_idx = batch['end'].to(device)
start, end = model(input_ids=input_ids, attention_mask=attention_mask)
st_loss = model.compute_loss(start, start_idx)
end_loss = model.compute_loss(end, end_idx)
st_total_loss += st_loss.item()
end_total_loss += end_loss.item()
# perform backward propagation to compute the gradients
st_loss.backward()
end_loss.backward()
# update the weights
optimizer.step()
但后来我上线了 end_loss.backward()
:
Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time.
我应该单独做反向传球吗?或者我应该用另一种方式来做吗?谢谢!
标准程序只是对损失求和并在总和上进行反向传播。
确保您要求和的两个损失的值平均大约一样大,或者至少与您希望每个损失相对于彼此的重要性成正比(否则,模型是将针对较大的损失而不是较小的损失进行优化)。在跨度检测案例中,我猜这不是必需的,但是由于问题的明显对称性。