使用 R 进行情绪分析
sentiment analysis with R
我正在使用对应于 1-8 分值范围的单词列表进行情绪分析,而不是将正面单词计为 1,将负面单词计为 -1。
这是列表的一部分:
word score
laughter 8.50
happiness 8.44
love 8.42
happy 8.30
laughed 8.26
laugh 8.22
如何将此列表应用到 sentiment.score 函数,以便我将获得分数 * 字数而不是仅字数
score.sentiment = function(sentences, new_list, .progress='none')
{
require(plyr)
require(stringr)
# we got a vector of sentences. plyr will handle a list or a vector as an "l" for us
# we want a simple array of scores back, so we use "l" + "a" + "ply" = laply:
scores = laply(sentences, function(sentence, terms) {
# clean up sentences with R's regex-driven global substitute, gsub():
sentence = gsub('[[:punct:]]', '', sentence)
sentence = gsub('[[:cntrl:]]', '', sentence)
sentence = gsub('\d+', '', sentence)
# and convert to lower case:
sentence = tolower(sentence)
# split into words. str_split is in the stringr package
word.list = str_split(sentence, '\s+')
# sometimes a list() is one level of hierarchy too much
words = unlist(word.list)
# compare our words to the dictionaries of positive & negative terms
words.matches = match(words, terms)
# match() returns the position of the matched term or NA
# we just want a TRUE/FALSE:
words.matches = !is.na(words.matches)
# how to count the score??
score = ?????
return(score)
}, terms, .progress=.progress )
scores.df = data.frame(score=scores, text=sentences)
return(scores.df)
}
这是一个例子:
df <- read.table(header=TRUE, text="word score
laughter 8.50
happiness 8.44
love 8.42
happy 8.30
laughed 8.26
laugh 8.22")
sentence <- "I love happiness"
words <- strsplit(sentence, "\s+")[[1]]
score <- sum(df$score[match(words, df$word)], na.rm = TRUE)
print(score)
# [1] 16.86
我正在使用对应于 1-8 分值范围的单词列表进行情绪分析,而不是将正面单词计为 1,将负面单词计为 -1。
这是列表的一部分:
word score
laughter 8.50
happiness 8.44
love 8.42
happy 8.30
laughed 8.26
laugh 8.22
如何将此列表应用到 sentiment.score 函数,以便我将获得分数 * 字数而不是仅字数
score.sentiment = function(sentences, new_list, .progress='none')
{
require(plyr)
require(stringr)
# we got a vector of sentences. plyr will handle a list or a vector as an "l" for us
# we want a simple array of scores back, so we use "l" + "a" + "ply" = laply:
scores = laply(sentences, function(sentence, terms) {
# clean up sentences with R's regex-driven global substitute, gsub():
sentence = gsub('[[:punct:]]', '', sentence)
sentence = gsub('[[:cntrl:]]', '', sentence)
sentence = gsub('\d+', '', sentence)
# and convert to lower case:
sentence = tolower(sentence)
# split into words. str_split is in the stringr package
word.list = str_split(sentence, '\s+')
# sometimes a list() is one level of hierarchy too much
words = unlist(word.list)
# compare our words to the dictionaries of positive & negative terms
words.matches = match(words, terms)
# match() returns the position of the matched term or NA
# we just want a TRUE/FALSE:
words.matches = !is.na(words.matches)
# how to count the score??
score = ?????
return(score)
}, terms, .progress=.progress )
scores.df = data.frame(score=scores, text=sentences)
return(scores.df)
}
这是一个例子:
df <- read.table(header=TRUE, text="word score
laughter 8.50
happiness 8.44
love 8.42
happy 8.30
laughed 8.26
laugh 8.22")
sentence <- "I love happiness"
words <- strsplit(sentence, "\s+")[[1]]
score <- sum(df$score[match(words, df$word)], na.rm = TRUE)
print(score)
# [1] 16.86