Hugging Face的Transformers库中Trainer中使用的损失函数是什么？

Question

Hugging Face的Transformers库Trainer中使用的损失函数是什么？

我正在尝试使用 Hugging Face 的 Transformers 库中的 Trainer class 微调 BERT 模型。

在他们的 documentation 中，他们提到可以通过覆盖 class 中的 compute_loss 方法来指定自定义损失函数。但是，如果我不执行方法重写并使用 Trainer 直接针对情感 classification 微调 BERT 模型，那么使用的默认损失函数是什么？它是分类交叉熵吗？谢谢！

Answer 1

视情况而定！特别是考虑到您相对模糊的设置描述，不清楚将使用什么损失。但是从头开始，让我们先看看 Trainer class 中默认的 compute_loss() 函数是什么样子的。

你可以找到相应的功能here，如果你想自己看看（目前的版本是4.17）。 actual loss that will be returned with default parameters 取自模型的输出值：

loss = outputs["loss"] if isinstance(outputs, dict) else outputs[0]

这意味着模型本身（默认情况下）负责计算某种损失并将其返回 outputs。

在此之后，我们可以查看 BERT 的实际模型定义（来源：here, and in particular check out the model that will be used in your Sentiment Analysis task (I assume a BertForSequenceClassification model。

code relevant for defining a loss function 看起来像这样：

if labels is not None:
    if self.config.problem_type is None:
        if self.num_labels == 1:
            self.config.problem_type = "regression"
        elif self.num_labels > 1 and (labels.dtype == torch.long or labels.dtype == torch.int):
            self.config.problem_type = "single_label_classification"
        else:
            self.config.problem_type = "multi_label_classification"

    if self.config.problem_type == "regression":
        loss_fct = MSELoss()
        if self.num_labels == 1:
            loss = loss_fct(logits.squeeze(), labels.squeeze())
        else:
            loss = loss_fct(logits, labels)
    elif self.config.problem_type == "single_label_classification":
        loss_fct = CrossEntropyLoss()
        loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
    elif self.config.problem_type == "multi_label_classification":
        loss_fct = BCEWithLogitsLoss()
        loss = loss_fct(logits, labels)

根据此信息，您应该能够自己设置正确的损失函数（通过相应地更改 model.config.problem_type），或者至少能够根据任务的超参数（标签数量、标签分数等）

Hugging Face的Transformers库中Trainer中使用的损失函数是什么？

What is the loss function used in Trainer from the Transformers library of Hugging Face?

python

nlp

artificial-intelligence

machine-learning

huggingface-transformers