使用 huggingface transformer 的多次训练将给出完全相同的结果,除了第一次
Multiple training with huggingface transformers will give exactly the same result except for the first time
我有一个函数可以从 huggingface 加载预训练模型并对其进行微调以进行情感分析,然后计算 F1 分数和 returns 结果。
问题是当我使用完全相同的参数多次调用此函数时,它会给出与预期完全相同的指标分数,除了第一次不同,这怎么可能?
这是我根据huggingface中的this tutorial写的函数:
import uuid
import numpy as np
from datasets import (
load_dataset,
load_metric,
DatasetDict,
concatenate_datasets
)
from transformers import (
AutoTokenizer,
AutoModelForSequenceClassification,
DataCollatorWithPadding,
TrainingArguments,
Trainer,
)
CHECKPOINT = "distilbert-base-uncased"
SAVING_FOLDER = "sst2"
def custom_train(datasets, checkpoint=CHECKPOINT, saving_folder=SAVING_FOLDER):
model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
def tokenize_function(example):
return tokenizer(example["sentence"], truncation=True)
tokenized_datasets = datasets.map(tokenize_function, batched=True)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
saving_folder = f"{SAVING_FOLDER}_{str(uuid.uuid1())}"
training_args = TrainingArguments(saving_folder)
trainer = Trainer(
model,
training_args,
train_dataset=tokenized_datasets["train"],
eval_dataset=tokenized_datasets["validation"],
data_collator=data_collator,
tokenizer=tokenizer,
)
trainer.train()
predictions = trainer.predict(tokenized_datasets["test"])
print(predictions.predictions.shape, predictions.label_ids.shape)
preds = np.argmax(predictions.predictions, axis=-1)
metric_fun = load_metric("f1")
metric_result = metric_fun.compute(predictions=preds, references=predictions.label_ids)
return metric_result
然后我将运行这个函数多次使用相同的数据集,并每次附加返回的F1分数的结果:
raw_datasets = load_dataset("glue", "sst2")
small_datasets = DatasetDict({
"train": raw_datasets["train"].select(range(100)).flatten_indices(),
"validation": raw_datasets["validation"].select(range(100)).flatten_indices(),
"test": raw_datasets["validation"].select(range(100, 200)).flatten_indices(),
})
results = []
for i in range(4):
result = custom_train(small_datasets)
results.append(result)
然后当我查看结果列表时:
[{'f1': 0.7755102040816325}, {'f1': 0.5797101449275361}, {'f1': 0.5797101449275361}, {'f1': 0.5797101449275361}]
可能想到的是,当我加载一个预训练的模型时,头部将被随机权重初始化,这就是为什么结果不同的原因,如果是这样,为什么只有第一个是不同的,其他的完全一样?
Sylvain Gugger answered this question here: https://discuss.huggingface.co/t/multiple-training-will-give-exactly-the-same-result-except-for-the-first-time/8493
You need to set the seed before instantiating your model, otherwise the random head is not initialized the same way, that’s why the first run will always be different.
The subsequent runs are all the same because the seed has been set by the Trainer in the train method.
To set the seed:
from transformers import set_seed
set_seed(42)
我有一个函数可以从 huggingface 加载预训练模型并对其进行微调以进行情感分析,然后计算 F1 分数和 returns 结果。 问题是当我使用完全相同的参数多次调用此函数时,它会给出与预期完全相同的指标分数,除了第一次不同,这怎么可能?
这是我根据huggingface中的this tutorial写的函数:
import uuid
import numpy as np
from datasets import (
load_dataset,
load_metric,
DatasetDict,
concatenate_datasets
)
from transformers import (
AutoTokenizer,
AutoModelForSequenceClassification,
DataCollatorWithPadding,
TrainingArguments,
Trainer,
)
CHECKPOINT = "distilbert-base-uncased"
SAVING_FOLDER = "sst2"
def custom_train(datasets, checkpoint=CHECKPOINT, saving_folder=SAVING_FOLDER):
model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
def tokenize_function(example):
return tokenizer(example["sentence"], truncation=True)
tokenized_datasets = datasets.map(tokenize_function, batched=True)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
saving_folder = f"{SAVING_FOLDER}_{str(uuid.uuid1())}"
training_args = TrainingArguments(saving_folder)
trainer = Trainer(
model,
training_args,
train_dataset=tokenized_datasets["train"],
eval_dataset=tokenized_datasets["validation"],
data_collator=data_collator,
tokenizer=tokenizer,
)
trainer.train()
predictions = trainer.predict(tokenized_datasets["test"])
print(predictions.predictions.shape, predictions.label_ids.shape)
preds = np.argmax(predictions.predictions, axis=-1)
metric_fun = load_metric("f1")
metric_result = metric_fun.compute(predictions=preds, references=predictions.label_ids)
return metric_result
然后我将运行这个函数多次使用相同的数据集,并每次附加返回的F1分数的结果:
raw_datasets = load_dataset("glue", "sst2")
small_datasets = DatasetDict({
"train": raw_datasets["train"].select(range(100)).flatten_indices(),
"validation": raw_datasets["validation"].select(range(100)).flatten_indices(),
"test": raw_datasets["validation"].select(range(100, 200)).flatten_indices(),
})
results = []
for i in range(4):
result = custom_train(small_datasets)
results.append(result)
然后当我查看结果列表时:
[{'f1': 0.7755102040816325}, {'f1': 0.5797101449275361}, {'f1': 0.5797101449275361}, {'f1': 0.5797101449275361}]
可能想到的是,当我加载一个预训练的模型时,头部将被随机权重初始化,这就是为什么结果不同的原因,如果是这样,为什么只有第一个是不同的,其他的完全一样?
Sylvain Gugger answered this question here: https://discuss.huggingface.co/t/multiple-training-will-give-exactly-the-same-result-except-for-the-first-time/8493
You need to set the seed before instantiating your model, otherwise the random head is not initialized the same way, that’s why the first run will always be different. The subsequent runs are all the same because the seed has been set by the Trainer in the train method. To set the seed:
from transformers import set_seed
set_seed(42)