"ValueError: You have to specify either input_ids or inputs_embeds" when training AutoModelWithLMHead Model (GPT-2)
"ValueError: You have to specify either input_ids or inputs_embeds" when training AutoModelWithLMHead Model (GPT-2)
我想微调来自 this repository 的 AutoModelWithLMHead 模型,这是一个德国 GPT-2 模型。我已经按照教程进行了预处理和微调。我已经为微调准备了一堆文本段落,但是在开始训练时,我收到以下错误:
File "GPT\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "GPT\lib\site-packages\transformers\models\gpt2\modeling_gpt2.py", line 774, in forward
raise ValueError("You have to specify either input_ids or inputs_embeds")
ValueError: You have to specify either input_ids or inputs_embeds
这是我的代码供参考:
# Load data
with open("Fine-Tuning Dataset/train.txt", "r", encoding="utf-8") as train_file:
train_data = train_file.read().split("--")
with open("Fine-Tuning Dataset/test.txt", "r", encoding="utf-8") as test_file:
test_data = test_file.read().split("--")
# Load pre-trained tokenizer and prepare input
tokenizer = AutoTokenizer.from_pretrained('dbmdz/german-gpt2')
tokenizer.pad_token = tokenizer.eos_token
train_input = tokenizer(train_data, padding="longest")
test_input = tokenizer(test_data, padding="longest")
# Define model
model = AutoModelWithLMHead.from_pretrained("dbmdz/german-gpt2")
training_args = TrainingArguments("test_trainer")
# Evaluation
metric = load_metric("accuracy")
def compute_metrics(eval_pred):
logits, labels = eval_pred
predictions = numpy.argmax(logits, axis=-1)
return metric.compute(predictions=predictions, references=labels)
# Train
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_input,
eval_dataset=test_input,
compute_metrics=compute_metrics,
)
trainer.train()
trainer.evaluate()
有人知道这是什么原因吗?欢迎任何帮助!
我没有找到这个问题的具体答案,但找到了解决方法。对于正在寻找有关如何微调 HuggingFace 的 GPT 模型的示例的任何人,您可以查看此 repo。他们列出了几个关于如何微调不同 Transformer 模型的示例,并辅以文档化的代码示例。我使用了 run_clm.py
脚本,它实现了我想要的。
我想微调来自 this repository 的 AutoModelWithLMHead 模型,这是一个德国 GPT-2 模型。我已经按照教程进行了预处理和微调。我已经为微调准备了一堆文本段落,但是在开始训练时,我收到以下错误:
File "GPT\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "GPT\lib\site-packages\transformers\models\gpt2\modeling_gpt2.py", line 774, in forward
raise ValueError("You have to specify either input_ids or inputs_embeds")
ValueError: You have to specify either input_ids or inputs_embeds
这是我的代码供参考:
# Load data
with open("Fine-Tuning Dataset/train.txt", "r", encoding="utf-8") as train_file:
train_data = train_file.read().split("--")
with open("Fine-Tuning Dataset/test.txt", "r", encoding="utf-8") as test_file:
test_data = test_file.read().split("--")
# Load pre-trained tokenizer and prepare input
tokenizer = AutoTokenizer.from_pretrained('dbmdz/german-gpt2')
tokenizer.pad_token = tokenizer.eos_token
train_input = tokenizer(train_data, padding="longest")
test_input = tokenizer(test_data, padding="longest")
# Define model
model = AutoModelWithLMHead.from_pretrained("dbmdz/german-gpt2")
training_args = TrainingArguments("test_trainer")
# Evaluation
metric = load_metric("accuracy")
def compute_metrics(eval_pred):
logits, labels = eval_pred
predictions = numpy.argmax(logits, axis=-1)
return metric.compute(predictions=predictions, references=labels)
# Train
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_input,
eval_dataset=test_input,
compute_metrics=compute_metrics,
)
trainer.train()
trainer.evaluate()
有人知道这是什么原因吗?欢迎任何帮助!
我没有找到这个问题的具体答案,但找到了解决方法。对于正在寻找有关如何微调 HuggingFace 的 GPT 模型的示例的任何人,您可以查看此 repo。他们列出了几个关于如何微调不同 Transformer 模型的示例,并辅以文档化的代码示例。我使用了 run_clm.py
脚本,它实现了我想要的。