如何在 HuggingFace T5 Tokenizer 中抑制 "Using bos_token, but it is not set yet..."
How to Suppress "Using bos_token, but it is not set yet..." in HuggingFace T5 Tokenizer
我想关闭当我使用 unique_no_split_tokens
时 huggingface 产生的警告
In[2] tokenizer = T5Tokenizer.from_pretrained("t5-base")
In[3] tokenizer(" ".join([f"<extra_id_{n}>" for n in range(1,101)]), return_tensors="pt").input_ids.size()
Out[3]: torch.Size([1, 100])
Using bos_token, but it is not set yet.
Using cls_token, but it is not set yet.
Using mask_token, but it is not set yet.
Using sep_token, but it is not set yet.
有人知道怎么做吗?
这个解决方案对我有用:
tokenizer.add_tokens([f"_{n}" for n in range(1,100)], special_tokens=True)
model.resize_token_embeddings(len(tokenizer))
tokenizer.save_pretrained('pathToExtendedTokenizer/')
tokenizer = T5Tokenizer.from_pretrained("sandbox/t5_models/pretrained/tokenizer/")
我想关闭当我使用 unique_no_split_tokens
时 huggingface 产生的警告In[2] tokenizer = T5Tokenizer.from_pretrained("t5-base")
In[3] tokenizer(" ".join([f"<extra_id_{n}>" for n in range(1,101)]), return_tensors="pt").input_ids.size()
Out[3]: torch.Size([1, 100])
Using bos_token, but it is not set yet.
Using cls_token, but it is not set yet.
Using mask_token, but it is not set yet.
Using sep_token, but it is not set yet.
有人知道怎么做吗?
这个解决方案对我有用:
tokenizer.add_tokens([f"_{n}" for n in range(1,100)], special_tokens=True)
model.resize_token_embeddings(len(tokenizer))
tokenizer.save_pretrained('pathToExtendedTokenizer/')
tokenizer = T5Tokenizer.from_pretrained("sandbox/t5_models/pretrained/tokenizer/")