使用 TDM/DTM 在 R 中进行情感分析
Sentiment Analysis in R using TDM/DTM
我正在尝试借助我的 DTM(文档术语矩阵)或 TDM(术语文档矩阵)在 R 中应用情感分析。我在论坛和 google 上找不到任何类似的主题。因此,我创建了一个语料库,并从该语料库在 R 中生成了一个 dtm/tdm。我的下一步是应用稍后需要的情绪分析,通过 SVM 进行股票预测。我给出的代码是:
dtm <- DocumentTermMatrix(docs)
dtm <- removeSparseTerms(dtm, 0.99)
dtm <- as.data.frame(as.matrix(dtm))
tdm <- TermDocumentMatrix(docs)
tdm <- removeSparseTerms(tdm, 0.99)
tdm <- as.data.frame(as.matrix(tdm))
我读到可以在 get_sentiments() 函数的帮助下通过 tidytext 包。但是不可能用 DTM/TDM 来应用它。我如何 运行 对已被词干化、标记化等的已清理过滤词进行情绪分析?我看到很多人对空洞句子进行了情感分析,但我想将其应用于我的单个词,以查看它们是积极的,消极的,得分等。在此先感谢!
SentimentAnalysis
与 tm
.
集成良好
library(tm)
library(SentimentAnalysis)
documents <- c("Wow, I really like the new light sabers!",
"That book was excellent.",
"R is a fantastic language.",
"The service in this restaurant was miserable.",
"This is neither positive or negative.",
"The waiter forget about my dessert -- what poor service!")
vc <- VCorpus(VectorSource(documents))
dtm <- DocumentTermMatrix(vc)
analyzeSentiment(dtm,
rules=list(
"SentimentLM"=list(
ruleSentiment, loadDictionaryLM()
),
"SentimentQDAP"=list(
ruleSentiment, loadDictionaryQDAP()
)
)
)
# SentimentLM SentimentQDAP
# 1 0.000 0.1428571
# 2 0.000 0.0000000
# 3 0.000 0.0000000
# 4 0.000 0.0000000
# 5 0.000 0.0000000
# 6 -0.125 -0.2500000
要在 dtm 上使用 tidytext 获取情感,首先将 dtm 转换为 tidy 格式,然后在 tidy 数据和极化字典之间进行内部连接 words.I 将使用与上面相同的文档。上面示例中的一些文档是正面的,但给出了 neutrel 分数。
让我们看看 tidytext 的表现如何
library(tidytext)
library(tm)
library(dplyr)
library(tidyr)
documents <- c("Wow I really like the new light sabers",
"That book was excellent",
"R is a fantastic language",
"The service in this restaurant was miserable",
"This is neither positive or negative",
"The waiter forget about my dessert -- what poor service")
# create tidy format
vectors <- as.character(documents)
v_source <- VectorSource(vectors)
corpuss <- VCorpus(v_source)
dtm <- DocumentTermMatrix(corpuss)
as_tidy <- tidy(dtm)
# Using bing lexicon: you can use other as well(nrc/afinn)
bing <- get_sentiments("bing")
as_bing_words <- inner_join(as_tidy,bing,by = c("term"="word"))
# check positive and negative words
as_bing_words
# set index for documents number
index <- as_bing_words%>%mutate(doc=as.numeric(document))
# count by index and sentiment
index <- index %>% count(sentiment,doc)
# spread into positives and negavtives
index <- index %>% spread(sentiment,n,fill=0)
# add polarity scorer
index <- index %>% mutate(polarity = positive-negative)
index
Doc 4 和 6 为阴性,5 为中性,其余为阳性,实际情况如此
我正在尝试借助我的 DTM(文档术语矩阵)或 TDM(术语文档矩阵)在 R 中应用情感分析。我在论坛和 google 上找不到任何类似的主题。因此,我创建了一个语料库,并从该语料库在 R 中生成了一个 dtm/tdm。我的下一步是应用稍后需要的情绪分析,通过 SVM 进行股票预测。我给出的代码是:
dtm <- DocumentTermMatrix(docs)
dtm <- removeSparseTerms(dtm, 0.99)
dtm <- as.data.frame(as.matrix(dtm))
tdm <- TermDocumentMatrix(docs)
tdm <- removeSparseTerms(tdm, 0.99)
tdm <- as.data.frame(as.matrix(tdm))
我读到可以在 get_sentiments() 函数的帮助下通过 tidytext 包。但是不可能用 DTM/TDM 来应用它。我如何 运行 对已被词干化、标记化等的已清理过滤词进行情绪分析?我看到很多人对空洞句子进行了情感分析,但我想将其应用于我的单个词,以查看它们是积极的,消极的,得分等。在此先感谢!
SentimentAnalysis
与 tm
.
library(tm)
library(SentimentAnalysis)
documents <- c("Wow, I really like the new light sabers!",
"That book was excellent.",
"R is a fantastic language.",
"The service in this restaurant was miserable.",
"This is neither positive or negative.",
"The waiter forget about my dessert -- what poor service!")
vc <- VCorpus(VectorSource(documents))
dtm <- DocumentTermMatrix(vc)
analyzeSentiment(dtm,
rules=list(
"SentimentLM"=list(
ruleSentiment, loadDictionaryLM()
),
"SentimentQDAP"=list(
ruleSentiment, loadDictionaryQDAP()
)
)
)
# SentimentLM SentimentQDAP
# 1 0.000 0.1428571
# 2 0.000 0.0000000
# 3 0.000 0.0000000
# 4 0.000 0.0000000
# 5 0.000 0.0000000
# 6 -0.125 -0.2500000
要在 dtm 上使用 tidytext 获取情感,首先将 dtm 转换为 tidy 格式,然后在 tidy 数据和极化字典之间进行内部连接 words.I 将使用与上面相同的文档。上面示例中的一些文档是正面的,但给出了 neutrel 分数。 让我们看看 tidytext 的表现如何
library(tidytext)
library(tm)
library(dplyr)
library(tidyr)
documents <- c("Wow I really like the new light sabers",
"That book was excellent",
"R is a fantastic language",
"The service in this restaurant was miserable",
"This is neither positive or negative",
"The waiter forget about my dessert -- what poor service")
# create tidy format
vectors <- as.character(documents)
v_source <- VectorSource(vectors)
corpuss <- VCorpus(v_source)
dtm <- DocumentTermMatrix(corpuss)
as_tidy <- tidy(dtm)
# Using bing lexicon: you can use other as well(nrc/afinn)
bing <- get_sentiments("bing")
as_bing_words <- inner_join(as_tidy,bing,by = c("term"="word"))
# check positive and negative words
as_bing_words
# set index for documents number
index <- as_bing_words%>%mutate(doc=as.numeric(document))
# count by index and sentiment
index <- index %>% count(sentiment,doc)
# spread into positives and negavtives
index <- index %>% spread(sentiment,n,fill=0)
# add polarity scorer
index <- index %>% mutate(polarity = positive-negative)
index
Doc 4 和 6 为阴性,5 为中性,其余为阳性,实际情况如此