如何为 R 中的情感分析分配不同的分数?
How to assign different scores for sentiment analysis in R?
我有一个推文文件,我 want/need 对其进行情绪分析。
我遇到过 this 流程,效果很好,但现在我想更改此代码,以便我可以根据情绪分配不同的分数。
这是代码:
score.sentiment = function(sentences , pos.words, neg.words , progress='none')
{
require(plyr)
require(stringr)
scores = laply(sentences,function(sentence,pos.words,neg.words)
{
sentence =gsub('[[:punct:]]','',sentence)
sentence =gsub('[[:cntrl]]','',sentence)
sentence =gsub('\d+','',sentence)
sentence=tolower(sentence)
word.list=str_split(sentence,'\s+')
words=unlist(word.list)
pos.matches=match(words,pos.words)
neg.matches=match(words,neg.words)
pos.matches = !is.na(pos.matches)
neg.matches = !is.na(neg.matches)
score=sum(pos.matches)-sum(neg.matches)
return(score)
},pos.words,neg.words,.progress=.progress)
scores.df=data.frame(scores=scores,text=sentences)
return(scores.df)
}
我现在要做的是拥有四本词典;
super.words,pos,words,neg.words,terrible.words.
我想为这些词典中的每一个分配不同的分数:
super.words=+2,pos.words=+1,neg.words=-1,terrible.words=-2.
我知道 pos.matches = !is.na(pos.matches)
和 neg.matches = !is.na(neg.matches)
为 TRUE/FALSE 分配 1/0,但我想了解如何分配这些特定分数,从而为每条推文提供分数。
目前,我只关注标准的两个词典,pos 和neg。
我已经为这两个数据框分配了分数:
posDF<-data.frame(words=pos, value=1, stringsAsFactors=F)
negDF<-data.frame(words=neg, value=-1, stringsAsFactors=F)
并尝试 运行 上述算法,但没有任何效果。
我看到 page and this 页面,其中有人写了几个 'for' 循环,但最终结果只提供了 -1,0 或 1 的总分。
最终,我正在寻找与此类似的结果:
table(analysis$score)
-5 -4 -3 -2 -1 0 1 2 3 4 5 6 19
3 8 49 164 603 2790 ..................等等
但是到目前为止,如果我得到的结果不需要 "debug" 代码,我会得到这个:
< table of extent 0 >
以下是我正在使用的一些示例推文:
tweets<-data.frame(words=c("@UKLabour @KarlTurnerMP #LabourManifesto Speaking as a carer, labours NHS plans are all good news, very happy. Making my day this!", "#LabourManifesto eggs and sweet things are looking evil", "@UKLabour @KarlTurnerMP Half way through the #LabourManifesto, this will definitely improve every-bodies lives if implemented fully.", "There is nothing "long term" about fossil fuels. #fracking #labourmanifesto https://twitter.com/stevetopple/status/587576796599595012", "Fair play Ed, very strong speech! Finally had the chance to watch it. #LabourManifesto wanna see the other manifestos nowwww") )
非常感谢任何帮助!
所以,基本上,我想知道是否有办法更改原始脚本的这一部分:
pos.matches=match(words,pos.words)
neg.matches=match(words,neg.words)
pos.matches = !is.na(pos.matches)
neg.matches = !is.na(neg.matches)
所以我可以分配自己的特定分数? (pos.words=+1, neg.words=-1) ?或者如果我必须合并各种 if 和 for 循环?
如果您只是想使用自定义分数来生成总分,您可以将此行 score=sum(pos.matches)-sum(neg.matches)
更改为:
score=sum((super.pos.matches)*2 + sum(pos.matches) + sum(neg.matches)*(-1) + sum(terrible.matches)*(-2))
如果您正在考虑使用四个词典。(在您的功能行中,进度前面缺少一个“.”)。
以下代码对您有帮助
score.sentiment = function(sentences , pos.words, neg.words , .progress='none')
{
require(plyr)
require(stringr)
scores = laply(sentences,function(sentence,pos.words,neg.words)
{
sentence =gsub('[[:punct:]]','',sentence)
sentence =gsub('[[:cntrl]]','',sentence)
sentence =gsub('\d+','',sentence)
sentence=tolower(sentence)
word.list=str_split(sentence,'\s+')
words=unlist(word.list)
pos.matches=match(words,pos.words)
super.pos.matches=match(words,super.pos.words)
neg.matches=match(words,neg.words)
terrible.matches=match(words,terrible.words)
pos.matches = !is.na(pos.matches)
super.pos.matches = !is.na(super.pos.matches)
neg.matches = !is.na(neg.matches)
terrible.matches = !is.na(terrible.matches)
score=sum((super.pos.matches)*2 + sum(pos.matches) - sum(neg.matches)
- sum(terrible.matches)*(2))
return(score)
},pos.words,neg.words,.progress=.progress)
scores.df=data.frame(scores=scores,text=sentences)
return(scores.df)
}
我有一个推文文件,我 want/need 对其进行情绪分析。 我遇到过 this 流程,效果很好,但现在我想更改此代码,以便我可以根据情绪分配不同的分数。
这是代码:
score.sentiment = function(sentences , pos.words, neg.words , progress='none')
{
require(plyr)
require(stringr)
scores = laply(sentences,function(sentence,pos.words,neg.words)
{
sentence =gsub('[[:punct:]]','',sentence)
sentence =gsub('[[:cntrl]]','',sentence)
sentence =gsub('\d+','',sentence)
sentence=tolower(sentence)
word.list=str_split(sentence,'\s+')
words=unlist(word.list)
pos.matches=match(words,pos.words)
neg.matches=match(words,neg.words)
pos.matches = !is.na(pos.matches)
neg.matches = !is.na(neg.matches)
score=sum(pos.matches)-sum(neg.matches)
return(score)
},pos.words,neg.words,.progress=.progress)
scores.df=data.frame(scores=scores,text=sentences)
return(scores.df)
}
我现在要做的是拥有四本词典;
super.words,pos,words,neg.words,terrible.words.
我想为这些词典中的每一个分配不同的分数: super.words=+2,pos.words=+1,neg.words=-1,terrible.words=-2.
我知道 pos.matches = !is.na(pos.matches)
和 neg.matches = !is.na(neg.matches)
为 TRUE/FALSE 分配 1/0,但我想了解如何分配这些特定分数,从而为每条推文提供分数。
目前,我只关注标准的两个词典,pos 和neg。 我已经为这两个数据框分配了分数:
posDF<-data.frame(words=pos, value=1, stringsAsFactors=F)
negDF<-data.frame(words=neg, value=-1, stringsAsFactors=F)
并尝试 运行 上述算法,但没有任何效果。
我看到
最终,我正在寻找与此类似的结果:
table(analysis$score)
-5 -4 -3 -2 -1 0 1 2 3 4 5 6 19
3 8 49 164 603 2790 ..................等等
但是到目前为止,如果我得到的结果不需要 "debug" 代码,我会得到这个:
< table of extent 0 >
以下是我正在使用的一些示例推文:
tweets<-data.frame(words=c("@UKLabour @KarlTurnerMP #LabourManifesto Speaking as a carer, labours NHS plans are all good news, very happy. Making my day this!", "#LabourManifesto eggs and sweet things are looking evil", "@UKLabour @KarlTurnerMP Half way through the #LabourManifesto, this will definitely improve every-bodies lives if implemented fully.", "There is nothing "long term" about fossil fuels. #fracking #labourmanifesto https://twitter.com/stevetopple/status/587576796599595012", "Fair play Ed, very strong speech! Finally had the chance to watch it. #LabourManifesto wanna see the other manifestos nowwww") )
非常感谢任何帮助!
所以,基本上,我想知道是否有办法更改原始脚本的这一部分:
pos.matches=match(words,pos.words)
neg.matches=match(words,neg.words)
pos.matches = !is.na(pos.matches)
neg.matches = !is.na(neg.matches)
所以我可以分配自己的特定分数? (pos.words=+1, neg.words=-1) ?或者如果我必须合并各种 if 和 for 循环?
如果您只是想使用自定义分数来生成总分,您可以将此行 score=sum(pos.matches)-sum(neg.matches)
更改为:
score=sum((super.pos.matches)*2 + sum(pos.matches) + sum(neg.matches)*(-1) + sum(terrible.matches)*(-2))
如果您正在考虑使用四个词典。(在您的功能行中,进度前面缺少一个“.”)。
以下代码对您有帮助
score.sentiment = function(sentences , pos.words, neg.words , .progress='none')
{
require(plyr)
require(stringr)
scores = laply(sentences,function(sentence,pos.words,neg.words)
{
sentence =gsub('[[:punct:]]','',sentence)
sentence =gsub('[[:cntrl]]','',sentence)
sentence =gsub('\d+','',sentence)
sentence=tolower(sentence)
word.list=str_split(sentence,'\s+')
words=unlist(word.list)
pos.matches=match(words,pos.words)
super.pos.matches=match(words,super.pos.words)
neg.matches=match(words,neg.words)
terrible.matches=match(words,terrible.words)
pos.matches = !is.na(pos.matches)
super.pos.matches = !is.na(super.pos.matches)
neg.matches = !is.na(neg.matches)
terrible.matches = !is.na(terrible.matches)
score=sum((super.pos.matches)*2 + sum(pos.matches) - sum(neg.matches)
- sum(terrible.matches)*(2))
return(score)
},pos.words,neg.words,.progress=.progress)
scores.df=data.frame(scores=scores,text=sentences)
return(scores.df)
}