在 HuggingFace Transformers 模型上使用量化

Use Quantization on HuggingFace Transformers models

我正在学习 量化,并且正在试验 第 1 部分 notebook

我想在我自己的模型上使用此代码。

假设我只需要在第1.2节

中分配给model变量
# load model
model = BertForSequenceClassification.from_pretrained(configs.output_dir)
model.to(configs.device)

我的模型来自不同的库:from transformers import pipeline。所以 .to() 抛出一个 AttributeError.

我的模特:

pip install transformers
from transformers import pipeline

unmasker = pipeline('fill-mask', model='bert-base-uncased')
model = unmasker("Hello I'm a [MASK] model.")

输出:

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

我如何 运行 我的示例模型上的链接量化代码?

请让我知道是否还有其他我需要澄清的地方 post。

pipeline 方法不适用于量化,因为我们需要返回模型。但是,您可以使用 pipeline 来测试原始 models 的计时等


量化码:

token_logits 包含量化模型的张量。

您可以在此代码周围放置一个 for-loop,并将 model_name 替换为 list 中的 string

model_name = bert-base-uncased
tokenizer = AutoTokenizer.from_pretrained(model_name )
model = AutoModelForMaskedLM.from_pretrained(model_name)
    
sequence = "Distilled models are smaller than the models they mimic. Using them instead of the large " \
f"versions would help {tokenizer.mask_token} our carbon footprint."

inputs = tokenizer(sequence, return_tensors="pt")
mask_token_index = torch.where(inputs["input_ids"] == tokenizer.mask_token_id)[1]
    
token_logits = model(**inputs).logits

# <- can stop here

Source