如何将嵌入直接输入到 huggingface 模型而不是标记?
How to input embeddings directly to a huggingface model instead of tokens?
我将回顾 huggingface tutorial 他们展示了如何将令牌输入模型以生成隐藏表示的地方:
import torch
from transformers import RobertaTokenizer
from transformers import RobertaModel
checkpoint = 'roberta-base'
tokenizer = RobertaTokenizer.from_pretrained(checkpoint)
model = RobertaModel.from_pretrained(checkpoint)
sequences = ["I've been waiting for a HuggingFace course my whole life."]
tokens = tokenizer(sequences, padding=True)
out = model(torch.tensor(tokens['input_ids']))
out.last_hidden_state
但是我怎样才能直接输入词嵌入而不是标记呢?也就是说,我有另一个生成词嵌入的模型,我需要将它们输入模型
大多数(每个?)huggingface 编码器模型支持参数 inputs_embeds:
import torch
from transformers import RobertaModel
m = RobertaModel.from_pretrained("roberta-base")
my_input = torch.rand(2,5,768)
outputs = m(inputs_embeds=my_input)
P.S.: 不要忘记注意面具,以防万一。
我将回顾 huggingface tutorial 他们展示了如何将令牌输入模型以生成隐藏表示的地方:
import torch
from transformers import RobertaTokenizer
from transformers import RobertaModel
checkpoint = 'roberta-base'
tokenizer = RobertaTokenizer.from_pretrained(checkpoint)
model = RobertaModel.from_pretrained(checkpoint)
sequences = ["I've been waiting for a HuggingFace course my whole life."]
tokens = tokenizer(sequences, padding=True)
out = model(torch.tensor(tokens['input_ids']))
out.last_hidden_state
但是我怎样才能直接输入词嵌入而不是标记呢?也就是说,我有另一个生成词嵌入的模型,我需要将它们输入模型
大多数(每个?)huggingface 编码器模型支持参数 inputs_embeds:
import torch
from transformers import RobertaModel
m = RobertaModel.from_pretrained("roberta-base")
my_input = torch.rand(2,5,768)
outputs = m(inputs_embeds=my_input)
P.S.: 不要忘记注意面具,以防万一。