正面、中性和负面词频
positive, neutral and negative words frequency
在最终提交之前,我需要对我的项目进行一些更正。我需要计算代码中正面、中性和负面的词。
我之前在尝试在输出正常的文本中查找词频时也做过同样的事情。
def gen_freq(text):
word_list=[] #stores the list of words
for words in text.split(): #Loop over all the reviews and extract words into word_list
word_list.extend(words)
word_freq=pd.Series(word_list).value_counts() #Create word frequencies using word_list
word_freq[:20]
#Print top 20 word
print(word_freq)
return word_freq[:20]
gen_freq(dataset.text.str)
我已经尝试做同样的事情来生成正面词的词频:
def positive_freq(text):
positive_list=[] #stores the list of words
for words in text.split(): #Loop over all the reviews and extract words into word_list
positive_list.extend(words)
word_freq=pd.Series(positive_list).value_counts() #Create word frequencies using word_list
word_freq[:20]
#Print top 20 word
print(word_freq)
return word_freq[:20]
positive_freq(dataset.text.str)
我使用此代码获取数据:
with open('reviews.json') as project_file:
data = json.load(project_file)
dataset=pd.json_normalize(data)
print(dataset.head())
正频的输出是这样的:
and 136
a 127
the 114
iPad 102
I 69
...
"fully 1
didn't. 1
would 1
instructions...but 1
these 1
不应该是这种情况,因为被确定为正面的形容词是这些:
Positive:
polarity adjectives
1 0.209881 right
1 0.209881 mad
1 0.209881 full
1 0.209881 full
1 0.209881 iPad
1 0.209881 iPad
1 0.209881 bad
1 0.209881 different
1 0.209881 wonderful
1 0.209881 much
1 0.209881 affordable
2 0.633333 stop
2 0.633333 great
2 0.633333 awesome
3 0.437143 awesome
4 0.398333 max
4 0.398333 high
4 0.398333 high
4 0.398333 Gorgeous
5 0.466667 decent
5 0.466667 easy
6 0.265146 it’s
6 0.265146 bright
6 0.265146 wonderful
6 0.265146 amazing
6 0.265146 full
6 0.265146 few
6 0.265146 such
6 0.265146 facial
6 0.265146 Big
6 0.265146 much
8 0.161979 old
8 0.161979 little
8 0.161979 Easy
8 0.161979 daily
8 0.161979 that’s
8 0.161979 late
9 0.084762 few
9 0.084762 huge
9 0.084762 storage.If
9 0.084762 few
另外,在生成频率时,我想绘制一个频率与每个单词的条形图,比如如果 right 的频率为 1,awesome 的频率为 2,它应该显示在图表上。对于中性和负面也是如此。请帮忙。
您的问题是您希望机器知道 positive/negative/neutral 个单词。机器如何从 .split() 中知道正面词?您需要首先提供 pre-define 个 positive/negative/neutral 单词的列表,然后在拆分后您应该检查每个标记是否存在于列表中。您可以通过诸如 sentiwordnet、sentistrengh 或许多其他词典或现有的 python 软件包之类的情感词典来访问这样的列表。示例:
from textblob import TextBlob
sent = 'a very simple and good sample'
pos_word_list = []
neg_word_list = []
neu_word_list = []
for word in sent.split():
testimonial = TextBlob(word)
if testimonial.sentiment.polarity >= 0.5:
pos_word_list.append(word)
elif testimonial.sentiment.polarity <= -0.5:
neg_word_list.append(word)
else:
neu_word_list.append(word)
输出:
在最终提交之前,我需要对我的项目进行一些更正。我需要计算代码中正面、中性和负面的词。 我之前在尝试在输出正常的文本中查找词频时也做过同样的事情。
def gen_freq(text):
word_list=[] #stores the list of words
for words in text.split(): #Loop over all the reviews and extract words into word_list
word_list.extend(words)
word_freq=pd.Series(word_list).value_counts() #Create word frequencies using word_list
word_freq[:20]
#Print top 20 word
print(word_freq)
return word_freq[:20]
gen_freq(dataset.text.str)
我已经尝试做同样的事情来生成正面词的词频:
def positive_freq(text):
positive_list=[] #stores the list of words
for words in text.split(): #Loop over all the reviews and extract words into word_list
positive_list.extend(words)
word_freq=pd.Series(positive_list).value_counts() #Create word frequencies using word_list
word_freq[:20]
#Print top 20 word
print(word_freq)
return word_freq[:20]
positive_freq(dataset.text.str)
我使用此代码获取数据:
with open('reviews.json') as project_file:
data = json.load(project_file)
dataset=pd.json_normalize(data)
print(dataset.head())
正频的输出是这样的:
and 136
a 127
the 114
iPad 102
I 69
...
"fully 1
didn't. 1
would 1
instructions...but 1
these 1
不应该是这种情况,因为被确定为正面的形容词是这些:
Positive:
polarity adjectives
1 0.209881 right
1 0.209881 mad
1 0.209881 full
1 0.209881 full
1 0.209881 iPad
1 0.209881 iPad
1 0.209881 bad
1 0.209881 different
1 0.209881 wonderful
1 0.209881 much
1 0.209881 affordable
2 0.633333 stop
2 0.633333 great
2 0.633333 awesome
3 0.437143 awesome
4 0.398333 max
4 0.398333 high
4 0.398333 high
4 0.398333 Gorgeous
5 0.466667 decent
5 0.466667 easy
6 0.265146 it’s
6 0.265146 bright
6 0.265146 wonderful
6 0.265146 amazing
6 0.265146 full
6 0.265146 few
6 0.265146 such
6 0.265146 facial
6 0.265146 Big
6 0.265146 much
8 0.161979 old
8 0.161979 little
8 0.161979 Easy
8 0.161979 daily
8 0.161979 that’s
8 0.161979 late
9 0.084762 few
9 0.084762 huge
9 0.084762 storage.If
9 0.084762 few
另外,在生成频率时,我想绘制一个频率与每个单词的条形图,比如如果 right 的频率为 1,awesome 的频率为 2,它应该显示在图表上。对于中性和负面也是如此。请帮忙。
您的问题是您希望机器知道 positive/negative/neutral 个单词。机器如何从 .split() 中知道正面词?您需要首先提供 pre-define 个 positive/negative/neutral 单词的列表,然后在拆分后您应该检查每个标记是否存在于列表中。您可以通过诸如 sentiwordnet、sentistrengh 或许多其他词典或现有的 python 软件包之类的情感词典来访问这样的列表。示例:
from textblob import TextBlob
sent = 'a very simple and good sample'
pos_word_list = []
neg_word_list = []
neu_word_list = []
for word in sent.split():
testimonial = TextBlob(word)
if testimonial.sentiment.polarity >= 0.5:
pos_word_list.append(word)
elif testimonial.sentiment.polarity <= -0.5:
neg_word_list.append(word)
else:
neu_word_list.append(word)
输出: