在 HuggingFace Transformers 模型上使用量化

Question

我正在学习量化，并且正在试验 第 1 部分 notebook。

我想在我自己的模型上使用此代码。

假设我只需要在第1.2节

中分配给model变量

# load model
model = BertForSequenceClassification.from_pretrained(configs.output_dir)
model.to(configs.device)

我的模型来自不同的库：from transformers import pipeline。所以 .to() 抛出一个 AttributeError.

我的模特：

pip install transformers

from transformers import pipeline

unmasker = pipeline('fill-mask', model='bert-base-uncased')
model = unmasker("Hello I'm a [MASK] model.")

输出：

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

我如何运行我的示例模型上的链接量化代码？

请让我知道是否还有其他我需要澄清的地方 post。

Answer 1

pipeline 方法不适用于量化，因为我们需要返回模型。但是，您可以使用 pipeline 来测试原始 models 的计时等

量化码：

token_logits 包含量化模型的张量。

您可以在此代码周围放置一个 for-loop，并将 model_name 替换为 list 中的 string。

model_name = bert-base-uncased
tokenizer = AutoTokenizer.from_pretrained(model_name )
model = AutoModelForMaskedLM.from_pretrained(model_name)
    
sequence = "Distilled models are smaller than the models they mimic. Using them instead of the large " \
f"versions would help {tokenizer.mask_token} our carbon footprint."

inputs = tokenizer(sequence, return_tensors="pt")
mask_token_index = torch.where(inputs["input_ids"] == tokenizer.mask_token_id)[1]
    
token_logits = model(**inputs).logits

# <- can stop here

Source

在 HuggingFace Transformers 模型上使用量化

Use Quantization on HuggingFace Transformers models

python

quantization

deep-learning

bert-language-model

huggingface-transformers