为什么我不能对多标签使用交叉熵损失?
Why can't I use Cross Entropy Loss for multilabel?
我正在针对 Natural Questions 数据集中的长答案任务微调 BERT 模型。我正在像 SQuAD 模型一样训练模型(预测开始和结束标记)。
我使用 Huggingface 和 PyTorch。
因此目标和标签的 shape/size 为 [batch, 2]。我的问题是我无法输入“多目标”,我认为这是指最后一个形状是 2.
RuntimeError: multi-target not supported at /pytorch/aten/src/THCUNN/generic/ClassNLLCriterion.cu:18
我应该选择另一个损失函数还是有其他方法可以绕过这个问题?
我正在使用的代码:
def loss_fn(preds, targets):
return nn.CrossEntropyLoss()(preds,labels)
class DecoderModel(nn.Module):
def __init__(self, model_args, encoder_config, loss_fn):
super(DecoderModel, self).__init__()
# ...
def forward(self, pooled_output, labels):
pooled_output = self.dropout(pooled_output)
logits = self.linear(pooled_output)
start_logits, end_logits = logits.split(1, dim = -1)
start_logit = torch.squeeze(start_logits, axis=-1)
end_logit = torch.squeeze(end_logits, axis=-1)
# Concatenate into a "label"
preds = torch.cat((start_logits, end_logits), -1)
# Calculate loss
loss = self.loss_fn(
preds = preds,
labels = labels)
return loss, preds
目标属性是:
torch.int64 & [3,2]
预测属性是:
torch.float32 & [3,2]
已解决 - 这是我的解决方案
def loss_fn(preds:list, labels):
start_token_labels, end_token_labels = labels.split(1, dim = -1)
start_token_labels = start_token_labels.squeeze(-1)
end_token_labels = end_token_labels.squeeze(-1)
print('*'*50)
print(preds[0].shape) # preds [0] and [1] has the same shape and dtype
print(preds[0].dtype) # preds [0] and [1] has the same shape and dtype
print(start_token_labels.shape) # labels [0] and [1] has the same shape and dtype
print(start_token_labels.dtype) # labels [0] and [1] has the same shape and dtype
start_loss = nn.CrossEntropyLoss()(preds[0], start_token_labels)
end_loss = nn.CrossEntropyLoss()(preds[1], end_token_labels)
avg_loss = (start_loss + end_loss) / 2
return avg_loss
基本上,我正在拆分逻辑(只是不连接它们)和标签。然后我对它们进行交叉熵损失,最后取两者之间的平均损失。希望这能给你一个解决你自己问题的想法!
你不应该给 CrossEntropyLoss
1-hot 向量,而是直接给标签
Target: (N) where each value is 0≤targets[i]≤C−1 , or (N, d_1, d_2, ..., d_K) with K≥1 in the case of K-dimensional loss.
您可以查看文档重现您的错误:
>>> loss = nn.CrossEntropyLoss()
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.empty(3, dtype=torch.long).random_(5)
>>> output = loss(input, target)
>>> output.backward()
但是如果您将 target
更改为 target = torch.empty((3, 5), dtype=torch.long).random_(5)
那么您会收到错误消息:
RuntimeError: 1D target tensor expected, multi-target not supported
使用 nn.BCELoss
和 logits 作为输入,参见这个例子:https://discuss.pytorch.org/t/multi-label-classification-in-pytorch/905/41
>>> nn.BCELoss()(torch.softmax(input, axis=1), torch.softmax(target.float(), axis=1))
>>> tensor(0.6376, grad_fn=<BinaryCrossEntropyBackward>)
我正在针对 Natural Questions 数据集中的长答案任务微调 BERT 模型。我正在像 SQuAD 模型一样训练模型(预测开始和结束标记)。
我使用 Huggingface 和 PyTorch。
因此目标和标签的 shape/size 为 [batch, 2]。我的问题是我无法输入“多目标”,我认为这是指最后一个形状是 2.
RuntimeError: multi-target not supported at /pytorch/aten/src/THCUNN/generic/ClassNLLCriterion.cu:18
我应该选择另一个损失函数还是有其他方法可以绕过这个问题?
我正在使用的代码:
def loss_fn(preds, targets):
return nn.CrossEntropyLoss()(preds,labels)
class DecoderModel(nn.Module):
def __init__(self, model_args, encoder_config, loss_fn):
super(DecoderModel, self).__init__()
# ...
def forward(self, pooled_output, labels):
pooled_output = self.dropout(pooled_output)
logits = self.linear(pooled_output)
start_logits, end_logits = logits.split(1, dim = -1)
start_logit = torch.squeeze(start_logits, axis=-1)
end_logit = torch.squeeze(end_logits, axis=-1)
# Concatenate into a "label"
preds = torch.cat((start_logits, end_logits), -1)
# Calculate loss
loss = self.loss_fn(
preds = preds,
labels = labels)
return loss, preds
目标属性是: torch.int64 & [3,2]
预测属性是: torch.float32 & [3,2]
已解决 - 这是我的解决方案
def loss_fn(preds:list, labels):
start_token_labels, end_token_labels = labels.split(1, dim = -1)
start_token_labels = start_token_labels.squeeze(-1)
end_token_labels = end_token_labels.squeeze(-1)
print('*'*50)
print(preds[0].shape) # preds [0] and [1] has the same shape and dtype
print(preds[0].dtype) # preds [0] and [1] has the same shape and dtype
print(start_token_labels.shape) # labels [0] and [1] has the same shape and dtype
print(start_token_labels.dtype) # labels [0] and [1] has the same shape and dtype
start_loss = nn.CrossEntropyLoss()(preds[0], start_token_labels)
end_loss = nn.CrossEntropyLoss()(preds[1], end_token_labels)
avg_loss = (start_loss + end_loss) / 2
return avg_loss
基本上,我正在拆分逻辑(只是不连接它们)和标签。然后我对它们进行交叉熵损失,最后取两者之间的平均损失。希望这能给你一个解决你自己问题的想法!
你不应该给 CrossEntropyLoss
1-hot 向量,而是直接给标签
Target: (N) where each value is 0≤targets[i]≤C−1 , or (N, d_1, d_2, ..., d_K) with K≥1 in the case of K-dimensional loss.
您可以查看文档重现您的错误:
>>> loss = nn.CrossEntropyLoss()
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.empty(3, dtype=torch.long).random_(5)
>>> output = loss(input, target)
>>> output.backward()
但是如果您将 target
更改为 target = torch.empty((3, 5), dtype=torch.long).random_(5)
那么您会收到错误消息:
RuntimeError: 1D target tensor expected, multi-target not supported
使用 nn.BCELoss
和 logits 作为输入,参见这个例子:https://discuss.pytorch.org/t/multi-label-classification-in-pytorch/905/41
>>> nn.BCELoss()(torch.softmax(input, axis=1), torch.softmax(target.float(), axis=1))
>>> tensor(0.6376, grad_fn=<BinaryCrossEntropyBackward>)