如何将嵌入直接输入到 huggingface 模型而不是标记?

How to input embeddings directly to a huggingface model instead of tokens?

我将回顾 huggingface tutorial 他们展示了如何将令牌输入模型以生成隐藏表示的地方:

import torch
from transformers import RobertaTokenizer
from transformers import RobertaModel

checkpoint = 'roberta-base'
tokenizer = RobertaTokenizer.from_pretrained(checkpoint)
model = RobertaModel.from_pretrained(checkpoint)

sequences = ["I've been waiting for a HuggingFace course my whole life."]

tokens = tokenizer(sequences, padding=True)
out = model(torch.tensor(tokens['input_ids']))
out.last_hidden_state

但是我怎样才能直接输入词嵌入而不是标记呢?也就是说,我有另一个生成词嵌入的模型,我需要将它们输入模型

大多数(每个?)huggingface 编码器模型支持参数 inputs_embeds:

import torch
from transformers import RobertaModel

m = RobertaModel.from_pretrained("roberta-base")

my_input = torch.rand(2,5,768)

outputs = m(inputs_embeds=my_input)

P.S.: 不要忘记注意面具,以防万一。