我们如何进行情绪分析并在每行文本旁边创建一个 'sentiment' 记录?

How can we do a sentiment analysis and create a 'sentiment' record next to each line of text?

我在 Google 上搜索了一些解决方案来进行情绪分析,并将结果写入正在分析的文本列旁边的列中。这是我想出来的。

import nltk
nltk.download('vader_lexicon')
nltk.download('punkt')

# first, we import the relevant modules from the NLTK library
from nltk.sentiment.vader import SentimentIntensityAnalyzer

# next, we initialize VADER so we can use it within our Python script
sid = SentimentIntensityAnalyzer()

# the variable 'message_text' now contains the text we will analyze.
message_text = '''Like you, I am getting very frustrated with this process. I am genuinely trying to be as reasonable as possible. I am not trying to "hold up" the deal at the last minute. I'm afraid that I am being asked to take a fairly large leap of faith after this company (I don't mean the two of you -- I mean Enron) has screwed me and the people who work for me.'''

print(message_text)

# Calling the polarity_scores method on sid and passing in the message_text outputs a dictionary with negative, neutral, positive, and compound scores for the input text
scores = sid.polarity_scores(message_text)

# Here we loop through the keys contained in scores (pos, neu, neg, and compound scores) and print the key-value pairs on the screen
for key in sorted(scores):
        print('{0}: {1}, '.format(key, scores[key]), end='')

这给了我:

compound: -0.3804, neg: 0.093, neu: 0.836, pos: 0.071, 

现在,我正在尝试从数据框中输入我自己的文本列。

示例代码来自本站。

https://programminghistorian.org/en/lessons/sentiment-analysis

我在数据框中有一个由文本组成的字段,如下所示。

These brush heads are okay!  Wish they came in a larger diameter, would cover more facial surface area and require less time to do the job!  However, I think they do a better job than just a face cloth in cleansing the pores.  I would recommend this product!
No opening to pee with. weird.  And really tight.  not very comfortable.
I choose it as spare parts always available and I will buy it again for sure!I will recommend it, without doubt!
love this cleanser!!
Best facial wipes invented!!!!!!(:

这些是我的数据框中的 5 条单独记录。我正在尝试想办法将每条记录评估为 'positive'、'negative' 或 'neutral',并将每个情绪放在同一行的新字段中。

在这个例子中,我认为这 5 条记录有以下 5 种情绪(在每条记录旁边的字段中):

neutral
negative
positive
positive
positive

我该怎么做?

我想到了另一个代码示例,如下所示。

event_dictionary ={scores["compound"] >= 0.05 : 'positive', scores["compound"] <= -0.05 : 'negative', scores["compound"] >= -0.05 and scores["compound"] <= 0.05 : 'neutral'} 
#message_text = str(message_text)
for message in message_text:
    scores = sid.polarity_scores(str(message))
    for key in sorted(scores):
        df['sentiment'] = df['body'].map(event_dictionary) 

这个运行大约15分钟,然后我取消了,我看到它实际上什么也没做。我想添加一个名为 'sentiment' 的字段并用 'positive' if scores["compound"] >= 0.05,'negative' if scores["compound"] <= 填充它-0.05,并且 'neutral' 如果分数 ["compound"] >= -0.05 并且分数 ["compound"] <= 0.05.

不确定这个数据框是什么样的,但您可以在每个字符串上使用情绪强度分析器来计算每条消息的极性分数。根据 github 页面,您可以使用 "compound" 键来计算消息的情绪。

https://github.com/cjhutto/vaderSentiment#about-the-scoring

messages = [
"These brush heads are okay!  Wish they came in a larger diameter, would cover more facial surface area and require less time to do the job!  However, I think they do a better job than just a face cloth in cleansing the pores.  I would recommend this product!",
"No opening to pee with. weird.  And really tight.  not very comfortable.",
"I choose it as spare parts always available and I will buy it again for sure!I will recommend it, without doubt!",
"love this cleanser!!",
"Best facial wipes invented!!!!!!(:"]

for message in messages:
    scores = sid.polarity_scores(message)

    for key in sorted(scores):
        print('{0}: {1} '.format(key, scores[key]), end='')

    if scores["compound"] >= 0.05:
        print("\npositive\n")

    elif scores["compound"] <= -0.05:
        print("\nnegative\n")
    else:
        print("\nneutral\n")

输出:

compound: 0.8713 neg: 0.0 neu: 0.782 pos: 0.218
positive

compound: -0.7021 neg: 0.431 neu: 0.569 pos: 0.0
negative

compound: 0.6362 neg: 0.0 neu: 0.766 pos: 0.234
positive

compound: 0.6988 neg: 0.0 neu: 0.295 pos: 0.705
positive

compound: 0.7482 neg: 0.0 neu: 0.359 pos: 0.641
positive