使用Huggingface TextClassificationPipeline时如何设置标签名称?
How to set the label names when using the Huggingface TextClassificationPipeline?
我正在使用经过微调的 Huggingface 模型(基于我公司的数据)和 TextClassificationPipeline 进行 class 预测。现在这个 Pipeline
预测的标签默认为 LABEL_0
、LABEL_1
等等。有没有办法将标签映射提供给 TextClassificationPipeline
对象,以便输出可以反映相同的内容?
Env:
- tensorflow==2.3.1
- transformers==4.3.2
示例代码:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # or any {'0', '1', '2'}
from transformers import TextClassificationPipeline, TFAutoModelForSequenceClassification, AutoTokenizer
MODEL_DIR = "path\to\my\fine-tuned\model"
# Feature extraction pipeline
model = TFAutoModelForSequenceClassification.from_pretrained(MODEL_DIR)
tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR)
pipeline = TextClassificationPipeline(model=model,
tokenizer=tokenizer,
framework='tf',
device=0)
result = pipeline("It was a good watch. But a little boring.")[0]
输出:
In [2]: result
Out[2]: {'label': 'LABEL_1', 'score': 0.8864616751670837}
添加此类映射的最简单方法是编辑模型的 config.json 以包含:id2label
字段,如下所示:
{
"_name_or_path": "distilbert-base-uncased",
"activation": "gelu",
"architectures": [
"DistilBertForMaskedLM"
],
"id2label": [
"negative",
"positive"
],
"attention_dropout": 0.1,
.
.
}
设置此映射的代码内方法是在 from_pretrained
调用中添加 id2label
参数,如下所示:
model = TFAutoModelForSequenceClassification.from_pretrained(MODEL_DIR, id2label={0: 'negative', 1: 'positive'})
这是我提出的 Github Issue 将其添加到 transformers.XForSequenceClassification 的文档中。
我正在使用经过微调的 Huggingface 模型(基于我公司的数据)和 TextClassificationPipeline 进行 class 预测。现在这个 Pipeline
预测的标签默认为 LABEL_0
、LABEL_1
等等。有没有办法将标签映射提供给 TextClassificationPipeline
对象,以便输出可以反映相同的内容?
Env:
- tensorflow==2.3.1
- transformers==4.3.2
示例代码:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # or any {'0', '1', '2'}
from transformers import TextClassificationPipeline, TFAutoModelForSequenceClassification, AutoTokenizer
MODEL_DIR = "path\to\my\fine-tuned\model"
# Feature extraction pipeline
model = TFAutoModelForSequenceClassification.from_pretrained(MODEL_DIR)
tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR)
pipeline = TextClassificationPipeline(model=model,
tokenizer=tokenizer,
framework='tf',
device=0)
result = pipeline("It was a good watch. But a little boring.")[0]
输出:
In [2]: result
Out[2]: {'label': 'LABEL_1', 'score': 0.8864616751670837}
添加此类映射的最简单方法是编辑模型的 config.json 以包含:id2label
字段,如下所示:
{
"_name_or_path": "distilbert-base-uncased",
"activation": "gelu",
"architectures": [
"DistilBertForMaskedLM"
],
"id2label": [
"negative",
"positive"
],
"attention_dropout": 0.1,
.
.
}
设置此映射的代码内方法是在 from_pretrained
调用中添加 id2label
参数,如下所示:
model = TFAutoModelForSequenceClassification.from_pretrained(MODEL_DIR, id2label={0: 'negative', 1: 'positive'})
这是我提出的 Github Issue 将其添加到 transformers.XForSequenceClassification 的文档中。