使用训练有素的 BERT 模型预测原始文本的情绪,抱脸
Predicting Sentiment of Raw Text using Trained BERT Model, Hugging Face
我正在预测推文的情绪分析,包括正面、负面和中性 类。我已经使用 Hugging Face 训练了一个 BERT 模型。现在我想对未标记的 Twitter 文本的数据框进行预测,但我遇到了困难。
我已经按照以下教程 (https://curiousily.com/posts/sentiment-analysis-with-bert-and-hugging-face-using-pytorch-and-python/) 进行操作,并且能够使用 Hugging Face 训练 BERT 模型。
这是一个预测原始文本的例子,但它只是一个句子,我想使用一列推文。 https://curiousily.com/posts/sentiment-analysis-with-bert-and-hugging-face-using-pytorch-and-python/#predicting-on-raw-text
review_text = "I love completing my todos! Best app ever!!!"
encoded_review = tokenizer.encode_plus(
review_text,
max_length=MAX_LEN,
add_special_tokens=True,
return_token_type_ids=False,
pad_to_max_length=True,
return_attention_mask=True,
return_tensors='pt',
)
input_ids = encoded_review['input_ids'].to(device)
attention_mask = encoded_review['attention_mask'].to(device)
output = model(input_ids, attention_mask)
_, prediction = torch.max(output, dim=1)
print(f'Review text: {review_text}')
print(f'Sentiment : {class_names[prediction]}')
Review text: I love completing my todos! Best app ever!!!
Sentiment : positive
比尔的回应有效。这是解决方案。
def predictionPipeline(text):
encoded_review = tokenizer.encode_plus(
text,
max_length=MAX_LEN,
add_special_tokens=True,
return_token_type_ids=False,
pad_to_max_length=True,
return_attention_mask=True,
return_tensors='pt',
)
input_ids = encoded_review['input_ids'].to(device)
attention_mask = encoded_review['attention_mask'].to(device)
output = model(input_ids, attention_mask)
_, prediction = torch.max(output, dim=1)
return(class_names[prediction])
df2['prediction']=df2['cleaned_tweet'].apply(predictionPipeline)
您可以使用相同的代码从数据框列中预测文本。
model = ...
tokenizer = ...
def predict(review_text):
encoded_review = tokenizer.encode_plus(
review_text,
max_length=MAX_LEN,
add_special_tokens=True,
return_token_type_ids=False,
pad_to_max_length=True,
return_attention_mask=True,
return_tensors='pt',
)
input_ids = encoded_review['input_ids'].to(device)
attention_mask = encoded_review['attention_mask'].to(device)
output = model(input_ids, attention_mask)
_, prediction = torch.max(output, dim=1)
print(f'Review text: {review_text}')
print(f'Sentiment : {class_names[prediction]}')
return class_names[prediction]
df = pd.DataFrame({
'texts': ["text1", "text2", "...."]
})
df_dataset["sentiments"] = df.apply(lambda l: predict(l.texts), axis=1)
比尔的回答很棒。但是运行2022/05我这边代码提示错误
TypeError: torch.max received an invalid combination of arguments - got
(numpy.ndarray, dim=int), but expected one of: (torch.FloatTensor source)
(torch.FloatTensor source, torch.FloatTensor other) didn’t match because some of the keywords were incorrect: dim
(torch.FloatTensor source, int dim)
(torch.FloatTensor source, int dim, bool keepdim)
模型的结构似乎已更改 output
。它不是张量对象,而是张量对象和其他一些东西的元组。
从 torch.max(output, dim=1)
更改为 torch.max(output[0], dim=1)
解决了这个问题。见参考:https://discuss.pytorch.org/t/how-to-solve-this-torch-max-error/106432
我正在预测推文的情绪分析,包括正面、负面和中性 类。我已经使用 Hugging Face 训练了一个 BERT 模型。现在我想对未标记的 Twitter 文本的数据框进行预测,但我遇到了困难。
我已经按照以下教程 (https://curiousily.com/posts/sentiment-analysis-with-bert-and-hugging-face-using-pytorch-and-python/) 进行操作,并且能够使用 Hugging Face 训练 BERT 模型。
这是一个预测原始文本的例子,但它只是一个句子,我想使用一列推文。 https://curiousily.com/posts/sentiment-analysis-with-bert-and-hugging-face-using-pytorch-and-python/#predicting-on-raw-text
review_text = "I love completing my todos! Best app ever!!!"
encoded_review = tokenizer.encode_plus(
review_text,
max_length=MAX_LEN,
add_special_tokens=True,
return_token_type_ids=False,
pad_to_max_length=True,
return_attention_mask=True,
return_tensors='pt',
)
input_ids = encoded_review['input_ids'].to(device)
attention_mask = encoded_review['attention_mask'].to(device)
output = model(input_ids, attention_mask)
_, prediction = torch.max(output, dim=1)
print(f'Review text: {review_text}')
print(f'Sentiment : {class_names[prediction]}')
Review text: I love completing my todos! Best app ever!!!
Sentiment : positive
比尔的回应有效。这是解决方案。
def predictionPipeline(text):
encoded_review = tokenizer.encode_plus(
text,
max_length=MAX_LEN,
add_special_tokens=True,
return_token_type_ids=False,
pad_to_max_length=True,
return_attention_mask=True,
return_tensors='pt',
)
input_ids = encoded_review['input_ids'].to(device)
attention_mask = encoded_review['attention_mask'].to(device)
output = model(input_ids, attention_mask)
_, prediction = torch.max(output, dim=1)
return(class_names[prediction])
df2['prediction']=df2['cleaned_tweet'].apply(predictionPipeline)
您可以使用相同的代码从数据框列中预测文本。
model = ...
tokenizer = ...
def predict(review_text):
encoded_review = tokenizer.encode_plus(
review_text,
max_length=MAX_LEN,
add_special_tokens=True,
return_token_type_ids=False,
pad_to_max_length=True,
return_attention_mask=True,
return_tensors='pt',
)
input_ids = encoded_review['input_ids'].to(device)
attention_mask = encoded_review['attention_mask'].to(device)
output = model(input_ids, attention_mask)
_, prediction = torch.max(output, dim=1)
print(f'Review text: {review_text}')
print(f'Sentiment : {class_names[prediction]}')
return class_names[prediction]
df = pd.DataFrame({
'texts': ["text1", "text2", "...."]
})
df_dataset["sentiments"] = df.apply(lambda l: predict(l.texts), axis=1)
比尔的回答很棒。但是运行2022/05我这边代码提示错误
TypeError: torch.max received an invalid combination of arguments - got
(numpy.ndarray, dim=int), but expected one of: (torch.FloatTensor source)
(torch.FloatTensor source, torch.FloatTensor other) didn’t match because some of the keywords were incorrect: dim
(torch.FloatTensor source, int dim)
(torch.FloatTensor source, int dim, bool keepdim)
模型的结构似乎已更改 output
。它不是张量对象,而是张量对象和其他一些东西的元组。
从 torch.max(output, dim=1)
更改为 torch.max(output[0], dim=1)
解决了这个问题。见参考:https://discuss.pytorch.org/t/how-to-solve-this-torch-max-error/106432