如何统计每个句子分数每个单词在句子中出现的次数?
How to count the number of occurrences of each word in a sentence for each sentence score?
我有一份用户调查文档:
Score Comment
8 Rapid bureaucratic affairs. Reports for policy...
4 There needs to be communication or feed back f...
7 service is satisfactory
5 Good
5 There is no
10 My main reason for the product is competition ...
9 Because I have not received the results. And m...
5 no reason
我想判断哪些关键词对应高分,哪些关键词对应低分。
我的想法是构建一个 table 的单词(或者,一个 "word vector" 字典),其中将包含与之关联的分数,以及该分数被使用的次数与那句话相关联。
类似于以下内容:
Word Score Count
Word1: 7 1
4 2
Word2: 5 1
9 1
3 2
2 1
Word3: 9 3
Word4: 8 1
9 1
4 2
... ... ...
然后,对于每个单词,平均分数是该单词关联的所有分数的平均值。
为此,我的代码如下:
word_vec = {}
# col 1 is the word, col 2 is the score, col 3 is the number of times it occurs
for i in range(len(data)):
sentence = data['SurveyResponse'][i].split(' ')
for word in sentence:
word_vec['word'] = word
if word in word_vec:
word_vec[word] = {'Score':data['SCORE'][i], 'NumberOfTimes':(word_vec[word]['NumberOfTimes'] += 1)}
else:
word_vec[word] = {'Score':data['SCORE'][i], 'NumberOfTimes':1}
但是这段代码给我以下错误:
File "<ipython-input-144-14b3edc8cbd4>", line 9
word_vec[word] = {'Score':data['SCORE'][i], 'NumberOfTimes':(word_vec[word]['NumberOfTimes'] += 1)}
^
SyntaxError: invalid syntax
有人可以告诉我正确的方法吗?
试试这段代码
word_vec = {}
# col 1 is the word, col 2 is the score, col 3 is the number of times it occurs
for i in range(len(data)):
sentence = data['SurveyResponse'][i].split(' ')
for word in sentence:
word_vec['word'] = word
if word in word_vec:
word_vec[word]['Score'] += data['SCORE'][i] # Keep accumulating the total score for each word, would be easier to find the average score later on
word_vec[word]['NumberOfTimes'] += 1
else:
word_vec[word] = {'Score':data['SCORE'][i], 'NumberOfTimes':1}
要增加'NumberOfTimes'的值,可以这样直接增加word_vec[word]['NumberOfTimes'] += 1
您可以使用收款柜台。它允许计算每个单词出现的次数。
举个例子:
from collections import Counter
c = Counter(["jsdf","ijoiuj","je","oui","je","non","oui","je"])
print(c)
结果:
Counter({'je': 3, 'oui': 2, 'ijoiuj': 1, 'jsdf': 1, 'non': 1})
您从文档中提取单词并将它们放入列表中。最后,该列表将由计数器处理以计算每个单词的出现次数。
我有一份用户调查文档:
Score Comment
8 Rapid bureaucratic affairs. Reports for policy...
4 There needs to be communication or feed back f...
7 service is satisfactory
5 Good
5 There is no
10 My main reason for the product is competition ...
9 Because I have not received the results. And m...
5 no reason
我想判断哪些关键词对应高分,哪些关键词对应低分。
我的想法是构建一个 table 的单词(或者,一个 "word vector" 字典),其中将包含与之关联的分数,以及该分数被使用的次数与那句话相关联。
类似于以下内容:
Word Score Count
Word1: 7 1
4 2
Word2: 5 1
9 1
3 2
2 1
Word3: 9 3
Word4: 8 1
9 1
4 2
... ... ...
然后,对于每个单词,平均分数是该单词关联的所有分数的平均值。
为此,我的代码如下:
word_vec = {}
# col 1 is the word, col 2 is the score, col 3 is the number of times it occurs
for i in range(len(data)):
sentence = data['SurveyResponse'][i].split(' ')
for word in sentence:
word_vec['word'] = word
if word in word_vec:
word_vec[word] = {'Score':data['SCORE'][i], 'NumberOfTimes':(word_vec[word]['NumberOfTimes'] += 1)}
else:
word_vec[word] = {'Score':data['SCORE'][i], 'NumberOfTimes':1}
但是这段代码给我以下错误:
File "<ipython-input-144-14b3edc8cbd4>", line 9
word_vec[word] = {'Score':data['SCORE'][i], 'NumberOfTimes':(word_vec[word]['NumberOfTimes'] += 1)}
^
SyntaxError: invalid syntax
有人可以告诉我正确的方法吗?
试试这段代码
word_vec = {}
# col 1 is the word, col 2 is the score, col 3 is the number of times it occurs
for i in range(len(data)):
sentence = data['SurveyResponse'][i].split(' ')
for word in sentence:
word_vec['word'] = word
if word in word_vec:
word_vec[word]['Score'] += data['SCORE'][i] # Keep accumulating the total score for each word, would be easier to find the average score later on
word_vec[word]['NumberOfTimes'] += 1
else:
word_vec[word] = {'Score':data['SCORE'][i], 'NumberOfTimes':1}
要增加'NumberOfTimes'的值,可以这样直接增加word_vec[word]['NumberOfTimes'] += 1
您可以使用收款柜台。它允许计算每个单词出现的次数。
举个例子:
from collections import Counter
c = Counter(["jsdf","ijoiuj","je","oui","je","non","oui","je"])
print(c)
结果:
Counter({'je': 3, 'oui': 2, 'ijoiuj': 1, 'jsdf': 1, 'non': 1})
您从文档中提取单词并将它们放入列表中。最后,该列表将由计数器处理以计算每个单词的出现次数。