huggingface return 概率和 class 标签 Trainer.predict

Question

有什么方法可以使用 Trainer.predict return 概率和实际 class 吗？

我查看了 page 上的文档，但无法弄清楚。截至目前，它似乎是 returning logits

显然，概率和实际值 class 都可以使用额外的编码来计算，但想知道是否有任何预建方法可以做到这一点

我目前的输出如下

new_predictions=trainer.predict(dataset_for_future_predicition_after_tokenizer)

new_predictions


PredictionOutput(predictions=array([[-0.43005577,  3.646306  , -0.8073783 , -1.0651836 , -1.3480505 ,
        -1.108013  ],
       [ 3.5415223 , -0.8513837 , -1.8553216 , -0.18011567, -0.35627165,
        -1.8364134 ],
       [-1.0167522 , -0.8911268 , -1.7115675 ,  0.01204597,  1.7177908 ,
         1.0401527 ],
       [-0.82407415, -0.46043932, -1.089274  ,  2.6252217 ,  0.33935028,
        -1.3623345 ]], dtype=float32), label_ids=None, metrics={'test_runtime': 0.0182, 'test_samples_per_second': 219.931, 'test_steps_per_second': 54.983})

Answer 1

如您所述，Trainer.predict returns 模型预测的输出，即对数。

如果您想为每个 class 获得不同的标签和分数，我建议您根据任务（TextClassification、TokenClassification 等）为您的模型使用相应的 pipeline。此 pipeline 在其 __call__ 方法中有一个 return_all_scores 参数，可让您获得预测中每个标签的所有分数。

这是一个例子：

from transformers import TextClassificationPipeline, AutoTokenizer, AutoModelForSequenceClassification

MODEL_NAME = "..."
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME)

pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer)
prediction = pipe("The text to predict", return_all_scores=True)

这是这个 prediction 变量的示例：

[{label: 'LABEL1', score: 0.80}, {label: 'LABEL2', score: 0.15}, {label: 'LABEL3', score: 0.05}]

标签名称可以在模型的 config.json 文件中设置，或者在创建模型（训练之前）时通过定义 id2label 和 label2id 模型参数来设置：

model = AutoModelForSequenceClassification.from_pretrained(
    MODEL_NAME,
    num_labels=num_labels,
    label2id={"Greeting": 0, "Help": 1, "Farewell": 2},
    id2label={0: "Greeting", 1: "Help", 2: "Farewell"},
)

huggingface return 概率和 class 标签 Trainer.predict

huggingface return probability and class label Trainer.predict

python

nlp

huggingface-transformers