NLTK:情感分析:结果一值

NLTK: sentiment analysis: result one value

很抱歉 post 这样做,因为答案可能是这样的: NLTK sentiment analysis is only returning one value

或这个post:Python NLTK not sentiment calculate correct

但我不知道如何将它应用到我的代码中。

我是 Python 和 NLTK 的新手,我讨厌我不得不用一大块代码打扰你,再次抱歉。

使用我使用的代码,我总是得到 'pos' 作为结果。我尝试通过将积极特征排除在训练集中来进行分类。那么 return 总是 'neutral'.

谁能告诉我我做错了什么? 非常感谢您!不要介意我使用的随机测试语句,它只是在我试图找出问题所在时出现的。

import re, math, collections, itertools
import nltk
import nltk.classify.util, nltk.metrics
from nltk.classify import NaiveBayesClassifier
from nltk.metrics import BigramAssocMeasures
from nltk.probability import FreqDist, ConditionalFreqDist  
from nltk.util import ngrams
from nltk.tokenize import word_tokenize
from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.stem.porter import *
from nltk.stem.snowball import SnowballStemmer

stemmer = SnowballStemmer("english", ignore_stopwords = True)

pos_tweets = ['I love bananas','I like pears','I eat oranges']
neg_tweets = ['I hate lettuce','I do not like tomatoes','I hate apples']
neutral_tweets = ['I buy chicken','I am boiling eggs','I am chopping vegetables']

def uni(doc):
    x = []
    y = []
    for tweet in doc:
        x.append(word_tokenize(tweet))
    for element in x:
        for word in element:
            if len(word)>2:
                word = word.lower()
                word = stemmer.stem(word)
                y.append(word)
    return y

def word_feats_uni(doc):
     return dict([(word, True) for word in uni(doc)])

def tokenizer_ngrams(document):
    all_tokens = []
    filtered_tokens = []
    for (sentence) in document:
        all_tokens.append(word_tokenize(sentence))
    return all_tokens

def get_bi (document):
    x = tokenizer_ngrams(document)
    c = []
    for sentence in x:
        c.extend([bigram for bigram in nltk.bigrams(sentence)])
    return c

def get_tri(document):
    x = tokenizer_ngrams(document)
    c = []
    for sentence in x:
        c.extend([bigram for bigram in nltk.bigrams(sentence)])
    return c

def word_feats_bi(doc): 
    return dict([(word, True) for word in get_bi(doc)])

def word_feats_tri(doc):
    return dict([(word, True) for word in get_tri(doc)])

def word_feats_test(doc):
    feats_test = {}
    feats_test.update(word_feats_uni(doc))
    feats_test.update(word_feats_bi(doc))
    feats_test.update(word_feats_tri(doc))
    return feats_test

pos_feats = [(word_feats_uni(pos_tweets),'pos')] + [(word_feats_bi(pos_tweets),'pos')] + [(word_feats_tri(pos_tweets),'pos')]

neg_feats = [(word_feats_uni(neg_tweets),'neg')] + [(word_feats_bi(neg_tweets),'neg')] + [(word_feats_tri(neg_tweets),'neg')]

neutral_feats = [(word_feats_uni(neutral_tweets),'neutral')] + [(word_feats_bi(neutral_tweets),'neutral')] + [(word_feats_tri(neutral_tweets),'neutral')]

trainfeats = pos_feats + neg_feats + neutral_feats

classifier = NaiveBayesClassifier.train(trainfeats)

print (classifier.classify(word_feats_test('I am chopping vegetables and boiling eggs')))

解决方法很简单。您的 word_feats_test 将为句子 'I am chopping vegetables and boiling eggs' return 一个空字典;因此分类器在没有特征的情况下偏向 pos

我把你的句子包装在一个列表中:

print(classifier.classify(word_feats_test(
      ['I am chopping vegetables and boiling eggs'])))

并打印 neutral

您应该使用完全相同的函数来计算所有 3 个特征:训练集、测试集和分类。