BERT:无法重现句子到嵌入操作

BERT: Unable to reproduce sentence-to-embedding operation

我正在尝试使用以下代码将句子转换为嵌入。

import torch
from pytorch_pretrained_bert import BertTokenizer, BertModel, BertForMaskedLM

model = BertModel.from_pretrained('bert-base-uncased')
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

text = "[CLS] This is a sentence. [SEP]"
tokens = tokenizer.tokenize(text)
input_ids = torch.tensor([tokenizer.convert_tokens_to_ids(tokenizer.tokenize(text))])
encoded_layers, pooled_output = model(input_ids, output_all_encoded_layers=False)

代码有效。然而,每次我 运行 这段代码,它都会给出不同的结果。对于相同的输入,encoded_layerspooled_output 每次都会改变。

感谢您的帮助!

也许“dropout”在推理时起作用。你可以试试 model.eval()

此外,“变形金刚”是长期支持。停止使用 pytorch_pretrained_bert

import torch
from transformers import BertTokenizerFast, BertModel

bert_path = "/Users/Caleb/Desktop/codes/ptms/bert-base"
tokenizer = BertTokenizerFast.from_pretrained(bert_path)
model = BertModel.from_pretrained(bert_path)

max_length = 32
test_str = "This is a sentence."
tokenized = tokenizer(test_str, max_length=max_length, padding="max_length")
input_ids = tokenized['input_ids']
input_ids = torch.unsqueeze(torch.LongTensor(input_ids), 0)
attention_mask = tokenized['attention_mask']
attention_mask = torch.unsqueeze(torch.IntTensor(attention_mask), 0)
res = model(input_ids, attention_mask=attention_mask)
print(res.last_hidden_state)