令牌索引序列长度问题
Token indices sequence length Issue
我是 运行 一个句子转换模型并试图截断我的标记,但它似乎没有用。我的密码是
from transformers import AutoModel, AutoTokenizer
model_name = "sentence-transformers/paraphrase-MiniLM-L6-v2"
model = AutoModel.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
text_tokens = tokenizer(text, padding=True, truncation=True, return_tensors="pt")
text_embedding = model(**text_tokens)["pooler_output"]
我不断收到以下警告:
Token indices sequence length is longer than the specified maximum sequence length
for this model (909 > 512). Running this sequence through the model will result in
indexing errors
我想知道为什么设置 truncation=True
没有将我的文本截断到所需的长度?
您需要在创建分词器时添加 max_length
参数,如下所示:
text_tokens = tokenizer(text, padding=True, max_length=512, truncation=True, return_tensors="pt")
原因:
truncation=True
不带 max_length
参数的序列长度等于模型可接受的最大输入长度。
此型号为1e30
或1000000000000000019884624838656
。您可以通过打印 tokenizer.model_max_length
.
来查看
根据有关 truncation
、
的 Huggingface 文档
True or 'only_first' truncate to a maximum length specified by the
max_length argument or the maximum length accepted by the model if no
max_length is provided (max_length=None).
我是 运行 一个句子转换模型并试图截断我的标记,但它似乎没有用。我的密码是
from transformers import AutoModel, AutoTokenizer
model_name = "sentence-transformers/paraphrase-MiniLM-L6-v2"
model = AutoModel.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
text_tokens = tokenizer(text, padding=True, truncation=True, return_tensors="pt")
text_embedding = model(**text_tokens)["pooler_output"]
我不断收到以下警告:
Token indices sequence length is longer than the specified maximum sequence length
for this model (909 > 512). Running this sequence through the model will result in
indexing errors
我想知道为什么设置 truncation=True
没有将我的文本截断到所需的长度?
您需要在创建分词器时添加 max_length
参数,如下所示:
text_tokens = tokenizer(text, padding=True, max_length=512, truncation=True, return_tensors="pt")
原因:
truncation=True
不带 max_length
参数的序列长度等于模型可接受的最大输入长度。
此型号为1e30
或1000000000000000019884624838656
。您可以通过打印 tokenizer.model_max_length
.
根据有关 truncation
、
True or 'only_first' truncate to a maximum length specified by the max_length argument or the maximum length accepted by the model if no max_length is provided (max_length=None).