Python 中的情感分析代码使用了什么算法?
What algorithm that was used on the sentiment analysis code in Python?
我有一个关于情绪分析的问题。我有一个数据包含在推文(加密货币)上。我打算进行情绪分析以获得每条推文的正面和负面结果。
我找到了很好的情绪分析代码,但因为我是这个领域的新手。我不知道在这上面使用了什么分类算法。这是代码:
# importing Libraries
from pandas import DataFrame, read_csv
import chardet
import matplotlib.pyplot as plt; plt.rcdefaults()
from matplotlib import rc
%matplotlib inline
import pandas as pd
plt.style.use('ggplot')
import numpy as np
import re
import warnings
#Visualisation
import matplotlib.pyplot as plt
import matplotlib
import seaborn as sns
from IPython.display import display
from mpl_toolkits.basemap import Basemap
from wordcloud import WordCloud, STOPWORDS
#nltk
from nltk.stem import WordNetLemmatizer
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from nltk.sentiment.util import *
from nltk import tokenize
from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.stem.snowball import SnowballStemmer
from nltk.corpus import stopwords
stop = stopwords.words('english')
matplotlib.style.use('ggplot')
pd.options.mode.chained_assignment = None
warnings.filterwarnings("ignore")
%matplotlib inline
#########Sentiment Analysis code########
tweets['text_lem'] = [''.join([WordNetLemmatizer().lemmatize(re.sub('[^A-Za-z]', ' ', line)) for line in lists]).strip() for lists in tweets['text']]
vectorizer = TfidfVectorizer(max_df=0.5,max_features=10000,min_df=10,stop_words='english',use_idf=True)
X = vectorizer.fit_transform(tweets['text_lem'].str.upper())
sid = SentimentIntensityAnalyzer()
tweets['sentiment_compound_polarity']=tweets.text_lem.apply(lambda x:sid.polarity_scores(x)['compound'])
tweets['sentiment_neutral']=tweets.text_lem.apply(lambda x:sid.polarity_scores(x)['neu'])
tweets['sentiment_negative']=tweets.text_lem.apply(lambda x:sid.polarity_scores(x)['neg'])
tweets['sentiment_pos']=tweets.text_lem.apply(lambda x:sid.polarity_scores(x)['pos'])
tweets['sentiment_type']=''
tweets.loc[tweets.sentiment_compound_polarity>0,'sentiment_type']='POSITIVE'
tweets.loc[tweets.sentiment_compound_polarity==0,'sentiment_type']='NEUTRAL'
tweets.loc[tweets.sentiment_compound_polarity<0,'sentiment_type']='NEGATIVE'
谁能告诉我更多关于情感分析代码的信息?
使用了什么算法?
这段代码中的分类器是SentimentIntensityAnalyser()
。 documentation 表示它可能是一个 NaiveBayesClassifier。
如果您访问原始论文 here,他们还提到了 NaiveBayesClassifier。
然而,从github project,作者指出:
The Python code for the rule-based sentiment analysis engine. Implements the grammatical and syntactical rules described in the paper, incorporating empirically derived quantifications for the impact of each rule on the perceived intensity of sentiment in sentence-level text.
因此您代码中的算法是 rule-based 算法,而不是机器学习算法。密码是 here.
正在测试库
使用论文中的代码:
hate_comments = ['I second that emotion! I can\'t understand how any decent human being could support them considering their ongoing loathsome record. #ToriesOut2018 #NHSCrisis #CambridgeAnalytica',
'Think we’d just share the ladder, Mikey pal. Nationalise all of the ladders and have a big old ladder party.',
'The Tories, young and old, do not understand that where child poverty, homelessness and the destruction of the NHS are concerned, there is absolutely nothing to smile about. Well done Lara.',
'I don\'t even like them!',
'Boom! Get in......',
'Me too',
'That\'s fine, but do it with a smile.',
'Yesss girl',
'Me too!',
'Ditto..',
'one day she will be all grown up .. ah bless',
'Who doesn\'t.',
'I hate them too Lara'
]
for sentence in hate_comments:
print(sentence)
ss = sid.polarity_scores(sentence)
for k in ss:
print('{0}: {1}, '.format(k, ss[k]), end='')
print()
[输出]:
I second that emotion! I can't understand how any decent human being could support them considering their ongoing loathsome record. #ToriesOut2018 #NHSCrisis #CambridgeAnalytica
neg: 0.0,
neu: 0.87,
pos: 0.13,
compound: 0.4574,
Think we’d just share the ladder, Mikey pal. Nationalise all of the ladders and have a big old ladder party.
neg: 0.0,
neu: 0.776,
pos: 0.224,
compound: 0.5994,
The Tories, young and old, do not understand that where child poverty, homelessness and the destruction of the NHS are concerned, there is absolutely nothing to smile about. Well done Lara.
neg: 0.244,
neu: 0.702,
pos: 0.055,
compound: -0.806,
I don't even like them!
neg: 0.445,
neu: 0.555,
pos: 0.0,
compound: -0.3404,
Boom! Get in......
neg: 0.0,
neu: 1.0,
pos: 0.0,
compound: 0.0,
Me too
neg: 0.0,
neu: 1.0,
pos: 0.0,
compound: 0.0,
That's fine, but do it with a smile.
neg: 0.0,
neu: 0.518,
pos: 0.482,
compound: 0.5647,
Yesss girl
neg: 0.0,
neu: 1.0,
pos: 0.0,
compound: 0.0,
Me too!
neg: 0.0,
neu: 1.0,
pos: 0.0,
compound: 0.0,
Ditto..
neg: 0.0,
neu: 1.0,
pos: 0.0,
compound: 0.0,
one day she will be all grown up .. ah bless
neg: 0.0,
neu: 0.781,
pos: 0.219,
compound: 0.4215,
Who doesn't.
neg: 0.0,
neu: 1.0,
pos: 0.0,
compound: 0.0,
I hate them too Lara
neg: 0.552,
neu: 0.448,
pos: 0.0,
compound: -0.5719,
您可以观察到逃避规则的消息未正确注释,例如 Yesss girl
或 Me too!
本应为正。
如果您能负担得起标记大量文本以预测情绪的成本,那么机器学习分类器通常更适合这些情况。
我有一个关于情绪分析的问题。我有一个数据包含在推文(加密货币)上。我打算进行情绪分析以获得每条推文的正面和负面结果。
我找到了很好的情绪分析代码,但因为我是这个领域的新手。我不知道在这上面使用了什么分类算法。这是代码:
# importing Libraries
from pandas import DataFrame, read_csv
import chardet
import matplotlib.pyplot as plt; plt.rcdefaults()
from matplotlib import rc
%matplotlib inline
import pandas as pd
plt.style.use('ggplot')
import numpy as np
import re
import warnings
#Visualisation
import matplotlib.pyplot as plt
import matplotlib
import seaborn as sns
from IPython.display import display
from mpl_toolkits.basemap import Basemap
from wordcloud import WordCloud, STOPWORDS
#nltk
from nltk.stem import WordNetLemmatizer
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from nltk.sentiment.util import *
from nltk import tokenize
from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.stem.snowball import SnowballStemmer
from nltk.corpus import stopwords
stop = stopwords.words('english')
matplotlib.style.use('ggplot')
pd.options.mode.chained_assignment = None
warnings.filterwarnings("ignore")
%matplotlib inline
#########Sentiment Analysis code########
tweets['text_lem'] = [''.join([WordNetLemmatizer().lemmatize(re.sub('[^A-Za-z]', ' ', line)) for line in lists]).strip() for lists in tweets['text']]
vectorizer = TfidfVectorizer(max_df=0.5,max_features=10000,min_df=10,stop_words='english',use_idf=True)
X = vectorizer.fit_transform(tweets['text_lem'].str.upper())
sid = SentimentIntensityAnalyzer()
tweets['sentiment_compound_polarity']=tweets.text_lem.apply(lambda x:sid.polarity_scores(x)['compound'])
tweets['sentiment_neutral']=tweets.text_lem.apply(lambda x:sid.polarity_scores(x)['neu'])
tweets['sentiment_negative']=tweets.text_lem.apply(lambda x:sid.polarity_scores(x)['neg'])
tweets['sentiment_pos']=tweets.text_lem.apply(lambda x:sid.polarity_scores(x)['pos'])
tweets['sentiment_type']=''
tweets.loc[tweets.sentiment_compound_polarity>0,'sentiment_type']='POSITIVE'
tweets.loc[tweets.sentiment_compound_polarity==0,'sentiment_type']='NEUTRAL'
tweets.loc[tweets.sentiment_compound_polarity<0,'sentiment_type']='NEGATIVE'
谁能告诉我更多关于情感分析代码的信息? 使用了什么算法?
这段代码中的分类器是SentimentIntensityAnalyser()
。 documentation 表示它可能是一个 NaiveBayesClassifier。
如果您访问原始论文 here,他们还提到了 NaiveBayesClassifier。
然而,从github project,作者指出:
The Python code for the rule-based sentiment analysis engine. Implements the grammatical and syntactical rules described in the paper, incorporating empirically derived quantifications for the impact of each rule on the perceived intensity of sentiment in sentence-level text.
因此您代码中的算法是 rule-based 算法,而不是机器学习算法。密码是 here.
正在测试库
使用论文中的代码:
hate_comments = ['I second that emotion! I can\'t understand how any decent human being could support them considering their ongoing loathsome record. #ToriesOut2018 #NHSCrisis #CambridgeAnalytica',
'Think we’d just share the ladder, Mikey pal. Nationalise all of the ladders and have a big old ladder party.',
'The Tories, young and old, do not understand that where child poverty, homelessness and the destruction of the NHS are concerned, there is absolutely nothing to smile about. Well done Lara.',
'I don\'t even like them!',
'Boom! Get in......',
'Me too',
'That\'s fine, but do it with a smile.',
'Yesss girl',
'Me too!',
'Ditto..',
'one day she will be all grown up .. ah bless',
'Who doesn\'t.',
'I hate them too Lara'
]
for sentence in hate_comments:
print(sentence)
ss = sid.polarity_scores(sentence)
for k in ss:
print('{0}: {1}, '.format(k, ss[k]), end='')
print()
[输出]:
I second that emotion! I can't understand how any decent human being could support them considering their ongoing loathsome record. #ToriesOut2018 #NHSCrisis #CambridgeAnalytica
neg: 0.0,
neu: 0.87,
pos: 0.13,
compound: 0.4574,
Think we’d just share the ladder, Mikey pal. Nationalise all of the ladders and have a big old ladder party.
neg: 0.0,
neu: 0.776,
pos: 0.224,
compound: 0.5994,
The Tories, young and old, do not understand that where child poverty, homelessness and the destruction of the NHS are concerned, there is absolutely nothing to smile about. Well done Lara.
neg: 0.244,
neu: 0.702,
pos: 0.055,
compound: -0.806,
I don't even like them!
neg: 0.445,
neu: 0.555,
pos: 0.0,
compound: -0.3404,
Boom! Get in......
neg: 0.0,
neu: 1.0,
pos: 0.0,
compound: 0.0,
Me too
neg: 0.0,
neu: 1.0,
pos: 0.0,
compound: 0.0,
That's fine, but do it with a smile.
neg: 0.0,
neu: 0.518,
pos: 0.482,
compound: 0.5647,
Yesss girl
neg: 0.0,
neu: 1.0,
pos: 0.0,
compound: 0.0,
Me too!
neg: 0.0,
neu: 1.0,
pos: 0.0,
compound: 0.0,
Ditto..
neg: 0.0,
neu: 1.0,
pos: 0.0,
compound: 0.0,
one day she will be all grown up .. ah bless
neg: 0.0,
neu: 0.781,
pos: 0.219,
compound: 0.4215,
Who doesn't.
neg: 0.0,
neu: 1.0,
pos: 0.0,
compound: 0.0,
I hate them too Lara
neg: 0.552,
neu: 0.448,
pos: 0.0,
compound: -0.5719,
您可以观察到逃避规则的消息未正确注释,例如 Yesss girl
或 Me too!
本应为正。
如果您能负担得起标记大量文本以预测情绪的成本,那么机器学习分类器通常更适合这些情况。