PipelineException:在输入中找不到 mask_token ([MASK])
PipelineException: No mask_token ([MASK]) found on the input
我收到此错误“PipelineException:在输入中找不到 mask_token ([MASK])”
当我 运行 这一行。
fill_mask("汽车 .")
我运行在 Colab 上使用它。
我的代码:
from transformers import BertTokenizer, BertForMaskedLM
from pathlib import Path
from tokenizers import ByteLevelBPETokenizer
from transformers import BertTokenizer, BertForMaskedLM
paths = [str(x) for x in Path(".").glob("**/*.txt")]
print(paths)
bert_tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
from transformers import BertModel, BertConfig
configuration = BertConfig()
model = BertModel(configuration)
configuration = model.config
print(configuration)
model = BertForMaskedLM.from_pretrained("bert-base-uncased")
from transformers import LineByLineTextDataset
dataset = LineByLineTextDataset(
tokenizer=bert_tokenizer,
file_path="./kant.txt",
block_size=128,
)
from transformers import DataCollatorForLanguageModeling
data_collator = DataCollatorForLanguageModeling(
tokenizer=bert_tokenizer, mlm=True, mlm_probability=0.15
)
from transformers import Trainer, TrainingArguments
training_args = TrainingArguments(
output_dir="./KantaiBERT",
overwrite_output_dir=True,
num_train_epochs=1,
per_device_train_batch_size=64,
save_steps=10_000,
save_total_limit=2,
)
trainer = Trainer(
model=model,
args=training_args,
data_collator=data_collator,
train_dataset=dataset,
)
trainer.train()
from transformers import pipeline
fill_mask = pipeline(
"fill-mask",
model=model,
tokenizer=bert_tokenizer,
device=0,
)
fill_mask("Auto Car <mask>."). # This line is giving me the error...
最后一行给我上面提到的错误。请让我知道我做错了什么或我必须做什么才能消除此错误。
完整错误:“f”在输入中找不到 mask_token ({self.tokenizer.mask_token}),“
即使您已经发现错误,也建议您在将来避免它。而不是调用
fill_mask("Auto Car <mask>.")
当您使用不同的模型时,您可以执行以下操作以更加灵活:
MASK_TOKEN = tokenizer.mask_token
fill_mask("Auto Car {}.".format(MASK_TOKEN))
如果模型实现更改了要识别的标记(有些是 identify ,有些是 [mask] ),那么你就会遇到麻烦。最好使用 f 字符串并传递参数。使用 f 弦的优点是直观易懂。
以下代码对我有用 -
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
mask_fill = pipeline("fill-mask", model="bert-base-uncased")
mask_fill(f"The gaming laptop is {tokenizer.mask_token} and I have loved playing games on it.", top_k=2)
我收到此错误“PipelineException:在输入中找不到 mask_token ([MASK])” 当我 运行 这一行。 fill_mask("汽车 .")
我运行在 Colab 上使用它。 我的代码:
from transformers import BertTokenizer, BertForMaskedLM
from pathlib import Path
from tokenizers import ByteLevelBPETokenizer
from transformers import BertTokenizer, BertForMaskedLM
paths = [str(x) for x in Path(".").glob("**/*.txt")]
print(paths)
bert_tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
from transformers import BertModel, BertConfig
configuration = BertConfig()
model = BertModel(configuration)
configuration = model.config
print(configuration)
model = BertForMaskedLM.from_pretrained("bert-base-uncased")
from transformers import LineByLineTextDataset
dataset = LineByLineTextDataset(
tokenizer=bert_tokenizer,
file_path="./kant.txt",
block_size=128,
)
from transformers import DataCollatorForLanguageModeling
data_collator = DataCollatorForLanguageModeling(
tokenizer=bert_tokenizer, mlm=True, mlm_probability=0.15
)
from transformers import Trainer, TrainingArguments
training_args = TrainingArguments(
output_dir="./KantaiBERT",
overwrite_output_dir=True,
num_train_epochs=1,
per_device_train_batch_size=64,
save_steps=10_000,
save_total_limit=2,
)
trainer = Trainer(
model=model,
args=training_args,
data_collator=data_collator,
train_dataset=dataset,
)
trainer.train()
from transformers import pipeline
fill_mask = pipeline(
"fill-mask",
model=model,
tokenizer=bert_tokenizer,
device=0,
)
fill_mask("Auto Car <mask>."). # This line is giving me the error...
最后一行给我上面提到的错误。请让我知道我做错了什么或我必须做什么才能消除此错误。
完整错误:“f”在输入中找不到 mask_token ({self.tokenizer.mask_token}),“
即使您已经发现错误,也建议您在将来避免它。而不是调用
fill_mask("Auto Car <mask>.")
当您使用不同的模型时,您可以执行以下操作以更加灵活:
MASK_TOKEN = tokenizer.mask_token
fill_mask("Auto Car {}.".format(MASK_TOKEN))
如果模型实现更改了要识别的标记(有些是 identify ,有些是 [mask] ),那么你就会遇到麻烦。最好使用 f 字符串并传递参数。使用 f 弦的优点是直观易懂。
以下代码对我有用 -
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
mask_fill = pipeline("fill-mask", model="bert-base-uncased")
mask_fill(f"The gaming laptop is {tokenizer.mask_token} and I have loved playing games on it.", top_k=2)