NLTK Vader SentimentIntensityAnalyzer Bigram
NLTK Vader SentimentIntensityAnalyzer Bigram
对于 Python 中的 VADER SentimentIntensityAnalyzer,有没有办法添加二元语法规则?我尝试用两个词输入更新词典,但它并没有改变极性分数。提前致谢!
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyser = SentimentIntensityAnalyzer()
#returns a compound score of -0.296
print(analyser.polarity_scores('no issues'))
analyser.lexicon['no issues'] = 0.0
#still returns a compound score of -0.296
print(analyser.polarity_scores('no issues'))
没有直接的方法可以将二元语法添加到 vader 词典中。这是因为 vader 考虑将单个标记用于情绪分析。但是,可以使用以下步骤执行此操作:
- 创建二元组作为标记。例如,您可以将二元组(“no issues”)转换为标记(“noissues”)。
- 维护新的极性字典
创建的令牌。 {“无问题”:2}
- 然后在之前执行额外的文本处理
传递用于情感评分计算的文本。
以下代码完成上述操作:
allowed_bigrams = {'noissues' : 2} #add more as per your requirement
def process_text(text):
tokens = text.lower().split() # list of tokens
bigrams = list(nltk.bigrams(tokens)) # create bigrams as tuples of tokens
bigrams = list(map(''.join, bigrams)) # join each word without space to create new bigram
bigrams.append('...') # make length of tokens and bigrams list equal
#begin recreating the text
final = ''
for i, token in enumerate(tokens):
b = bigrams[i]
if b in allowed_bigrams:
join_word = b # replace the word in text by bigram
tokens[i+1] = '' #skip the next word
else:
join_word = token
final += join_word + ' '
return final
text = 'Hello, I have no issues with you'
print (text)
print (analyser.polarity_scores(text))
final = process_text(text)
print (final)
print(analyser.polarity_scores(final))
输出:
Hello, I have no issues with you
{'neg': 0.268, 'neu': 0.732, 'pos': 0.0, 'compound': -0.296}
hello, i have noissues with you
{'neg': 0.0, 'neu': 0.625, 'pos': 0.375, 'compound': 0.4588}
请注意输出中两个单词“no”和“issues”是如何加在一起形成二元组“noissues”的。
对于 Python 中的 VADER SentimentIntensityAnalyzer,有没有办法添加二元语法规则?我尝试用两个词输入更新词典,但它并没有改变极性分数。提前致谢!
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyser = SentimentIntensityAnalyzer()
#returns a compound score of -0.296
print(analyser.polarity_scores('no issues'))
analyser.lexicon['no issues'] = 0.0
#still returns a compound score of -0.296
print(analyser.polarity_scores('no issues'))
没有直接的方法可以将二元语法添加到 vader 词典中。这是因为 vader 考虑将单个标记用于情绪分析。但是,可以使用以下步骤执行此操作:
- 创建二元组作为标记。例如,您可以将二元组(“no issues”)转换为标记(“noissues”)。
- 维护新的极性字典 创建的令牌。 {“无问题”:2}
- 然后在之前执行额外的文本处理 传递用于情感评分计算的文本。
以下代码完成上述操作:
allowed_bigrams = {'noissues' : 2} #add more as per your requirement
def process_text(text):
tokens = text.lower().split() # list of tokens
bigrams = list(nltk.bigrams(tokens)) # create bigrams as tuples of tokens
bigrams = list(map(''.join, bigrams)) # join each word without space to create new bigram
bigrams.append('...') # make length of tokens and bigrams list equal
#begin recreating the text
final = ''
for i, token in enumerate(tokens):
b = bigrams[i]
if b in allowed_bigrams:
join_word = b # replace the word in text by bigram
tokens[i+1] = '' #skip the next word
else:
join_word = token
final += join_word + ' '
return final
text = 'Hello, I have no issues with you'
print (text)
print (analyser.polarity_scores(text))
final = process_text(text)
print (final)
print(analyser.polarity_scores(final))
输出:
Hello, I have no issues with you
{'neg': 0.268, 'neu': 0.732, 'pos': 0.0, 'compound': -0.296}
hello, i have noissues with you
{'neg': 0.0, 'neu': 0.625, 'pos': 0.375, 'compound': 0.4588}
请注意输出中两个单词“no”和“issues”是如何加在一起形成二元组“noissues”的。