ValueError: The state dictionary of the model you are training to load is corrupted. Are you sure it was properly saved?
ValueError: The state dictionary of the model you are training to load is corrupted. Are you sure it was properly saved?
目标:修改此 Notebook 以使用 albert-base-v2 模型
内核:conda_pytorch_p36
.
第 1.2 节 从 ./MRPC/
目录中的文件实例化模型。
但是,我认为它适用于 BERT 模型,而不是 Albert。所以,我从 here 下载了一个 Albert config.json
文件。正是这个 chnage 导致了错误。
为了实例化 Albert 模型,我还需要做什么?
./MRPC/
目录:
!curl https://download.pytorch.org/tutorial/MRPC.zip --output MPRC.zip
!unzip -n MPRC.zip
from os import listdir
from os.path import isfile, join
mypath = './MRPC/'
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]
onlyfiles
---
['tokenizer_config.json',
'special_tokens_map.json',
'pytorch_model.bin',
'config.json',
'training_args.bin',
'added_tokens.json',
'vocab.txt']
配置:
# The output directory for the fine-tuned model, $OUT_DIR.
configs.output_dir = "./MRPC/"
# The data directory for the MRPC task in the GLUE benchmark, $GLUE_DIR/$TASK_NAME.
configs.data_dir = "./glue_data/MRPC"
# The model name or path for the pre-trained model.
configs.model_name_or_path = "albert-base-v2"
# The maximum length of an input sequence
configs.max_seq_length = 128
# Prepare GLUE task.
configs.task_name = "MRPC".lower()
configs.processor = processors[configs.task_name]()
configs.output_mode = output_modes[configs.task_name]
configs.label_list = configs.processor.get_labels()
configs.model_type = "albert".lower()
configs.do_lower_case = True
# Set the device, batch size, topology, and caching flags.
configs.device = "cpu"
configs.eval_batch_size = 1
configs.n_gpu = 0
configs.local_rank = -1
configs.overwrite_cache = False
型号:
model = AlbertForSequenceClassification.from_pretrained(configs.output_dir) # !
model.to(configs.device)
回溯:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-36-0936fd8cbb17> in <module>
1 # load model
----> 2 model = AlbertForSequenceClassification.from_pretrained(configs.output_dir)
3 model.to(configs.device)
4
5 # quantize model
~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/transformers/modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
1460 pretrained_model_name_or_path,
1461 ignore_mismatched_sizes=ignore_mismatched_sizes,
-> 1462 _fast_init=_fast_init,
1463 )
1464
~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/transformers/modeling_utils.py in _load_state_dict_into_model(cls, model, state_dict, pretrained_model_name_or_path, ignore_mismatched_sizes, _fast_init)
1601 if any(key in expected_keys_not_prefixed for key in loaded_keys):
1602 raise ValueError(
-> 1603 "The state dictionary of the model you are training to load is corrupted. Are you sure it was "
1604 "properly saved?"
1605 )
ValueError: The state dictionary of the model you are training to load is corrupted. Are you sure it was properly saved?
正是我要找的,textattack/albert-base-v2-MRPC
如何使用 /transformers 库
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("textattack/albert-base-v2-MRPC")
model = AutoModelForSequenceClassification.from_pretrained("textattack/albert-base-v2-MRPC")
或者只克隆模型存储库
git lfs install
git clone https://huggingface.co/textattack/albert-base-v2-MRPC
# if you want to clone without large files – just their pointers
# prepend your git clone with the following env var:
GIT_LFS_SKIP_SMUDGE=1
目标:修改此 Notebook 以使用 albert-base-v2 模型
内核:conda_pytorch_p36
.
第 1.2 节 从 ./MRPC/
目录中的文件实例化模型。
但是,我认为它适用于 BERT 模型,而不是 Albert。所以,我从 here 下载了一个 Albert config.json
文件。正是这个 chnage 导致了错误。
为了实例化 Albert 模型,我还需要做什么?
./MRPC/
目录:
!curl https://download.pytorch.org/tutorial/MRPC.zip --output MPRC.zip
!unzip -n MPRC.zip
from os import listdir
from os.path import isfile, join
mypath = './MRPC/'
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]
onlyfiles
---
['tokenizer_config.json',
'special_tokens_map.json',
'pytorch_model.bin',
'config.json',
'training_args.bin',
'added_tokens.json',
'vocab.txt']
配置:
# The output directory for the fine-tuned model, $OUT_DIR.
configs.output_dir = "./MRPC/"
# The data directory for the MRPC task in the GLUE benchmark, $GLUE_DIR/$TASK_NAME.
configs.data_dir = "./glue_data/MRPC"
# The model name or path for the pre-trained model.
configs.model_name_or_path = "albert-base-v2"
# The maximum length of an input sequence
configs.max_seq_length = 128
# Prepare GLUE task.
configs.task_name = "MRPC".lower()
configs.processor = processors[configs.task_name]()
configs.output_mode = output_modes[configs.task_name]
configs.label_list = configs.processor.get_labels()
configs.model_type = "albert".lower()
configs.do_lower_case = True
# Set the device, batch size, topology, and caching flags.
configs.device = "cpu"
configs.eval_batch_size = 1
configs.n_gpu = 0
configs.local_rank = -1
configs.overwrite_cache = False
型号:
model = AlbertForSequenceClassification.from_pretrained(configs.output_dir) # !
model.to(configs.device)
回溯:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-36-0936fd8cbb17> in <module>
1 # load model
----> 2 model = AlbertForSequenceClassification.from_pretrained(configs.output_dir)
3 model.to(configs.device)
4
5 # quantize model
~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/transformers/modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
1460 pretrained_model_name_or_path,
1461 ignore_mismatched_sizes=ignore_mismatched_sizes,
-> 1462 _fast_init=_fast_init,
1463 )
1464
~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/transformers/modeling_utils.py in _load_state_dict_into_model(cls, model, state_dict, pretrained_model_name_or_path, ignore_mismatched_sizes, _fast_init)
1601 if any(key in expected_keys_not_prefixed for key in loaded_keys):
1602 raise ValueError(
-> 1603 "The state dictionary of the model you are training to load is corrupted. Are you sure it was "
1604 "properly saved?"
1605 )
ValueError: The state dictionary of the model you are training to load is corrupted. Are you sure it was properly saved?
正是我要找的,textattack/albert-base-v2-MRPC
如何使用 /transformers 库
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("textattack/albert-base-v2-MRPC")
model = AutoModelForSequenceClassification.from_pretrained("textattack/albert-base-v2-MRPC")
或者只克隆模型存储库
git lfs install
git clone https://huggingface.co/textattack/albert-base-v2-MRPC
# if you want to clone without large files – just their pointers
# prepend your git clone with the following env var:
GIT_LFS_SKIP_SMUDGE=1