如何使用 R 在 Wordcloud 中突出显示消极和积极的词

How to highlight negative and positive words in a Wordcloud using R

我正在使用 R 进行情感分析,我想知道如何将词云分成两部分,突出正面和负面的词。我对 R 很陌生,在线解决方案对我没有帮助。那就是代码:

text <- readLines("product1.txt")

library("tm")
library("SnowballC")
library("wordcloud")
library("RColorBrewer")

docs <- Corpus(VectorSource(text))

toSpace <- content_transformer(function (x , pattern ) gsub(pattern, " ", x))
docs <- tm_map(docs, toSpace, "/")
docs <- tm_map(docs, toSpace, "@")
docs <- tm_map(docs, toSpace, "\|")

docs <- tm_map(docs, content_transformer(tolower))
docs <- tm_map(docs, removeNumbers)
docs <- tm_map(docs, removeWords, stopwords("english"))
docs <- tm_map(docs, removeWords, c("don", "s", "t")) 
docs <- tm_map(docs, removePunctuation)
docs <- tm_map(docs, stripWhitespace)

dtm <- TermDocumentMatrix(docs)
m <- as.matrix(dtm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
head(d, 10)

set.seed(1234)
wordcloud(words = d$word, freq = d$freq, min.freq = 1,
          max.words=200, random.order=FALSE, rot.per=0.35, 
          colors=brewer.pal(8, "Dark2"))

这是我想要达到的结果:

谢谢大家的帮助。

编辑:

docs <- structure(list(content = c("This product so far has not disappointed. My children love to use it and I like the ability to monitor control what content they see with ease.", 
"Great for beginner or experienced person. Bought as a gift and she loves it.", 
"Inexpensive tablet for him to use and learn on, step up from the NABI. He was thrilled with it, learn how to Skype on it already.", 
"I have had my Fire HD 8 two weeks now and I love it. This tablet is a great value.We are Prime Members and that is where this tablet SHINES. I love being able to easily access all of the Prime content as well as movies you can download and watch laterThis has a 1280/800 screen which has some really nice look to it its nice and crisp and very bright infact it is brighter then the ipad pro costing 0 base model. The build on this fire is INSANELY AWESOME running at only 7.7mm thick and the smooth glossy feel on the back it is really amazing to hold its like the futuristic tab in ur hands."
), meta = structure(list(language = "en"), class = "CorpusMeta"), 
    dmeta = structure(list(), .Names = character(0), row.names = c(NA, 
    6L), class = "data.frame")), class = c("SimpleCorpus", "Corpus"
))

the tutorial 中所示,要获得这样的结果,您应该有一个词典,即 "dictionary" 告诉您单词是肯定的还是否定的。有了这些信息,您就可以用它来为您的词云着色。
我们可以在link评论漂亮的例子:

library(janeaustenr)
library(dplyr)
library(stringr)

# here we tidy up the corpus, all the J.Austen books, having them cleaned and as result, a tibble with words.
tidy_books <- austen_books() %>%
  group_by(book) %>%
  mutate(linenumber = row_number(),
         chapter = cumsum(str_detect(text, regex("^chapter [\divxlc]", 
                                                 ignore_case = TRUE)))) %>%
  ungroup() %>%
  unnest_tokens(word, text)

library(wordcloud)
library(reshape2)

如前所述,您需要一个词典。 link 谈论各种词典,在这种情况下它使用 bing 一个:

get_sentiments("bing")
# A tibble: 6,788 x 2
   word        sentiment
   <chr>       <chr>    
 1 2-faced     negative 
 2 2-faces     negative 
 3 a+          positive 
 4 abnormal    negative 
 5 abolish     negative 
 6 abominable  negative 
 7 abominably  negative 
 8 abominate   negative 
 9 abomination negative 
10 abort       negative 
# ... with 6,778 more rows

现在,连接 tidy_books(语料库)和 bing(词典)的每个词,我们可以为每个词赋予正值或负值:

library(wordcloud)
library(reshape2)

 tidy_books %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, sentiment, sort = TRUE) %>%
  acast(word ~ sentiment, value.var = "n", fill = 0) %>%
  comparison.cloud(colors = c("gray20", "gray80"),
                   max.words = 100)

您将获得所需的输出。很明显,您必须将其转换为我没有的数据。

编辑:

根据您的情况,我们可以这样做:

# take all the phrases
docs1 <-tibble(phrases =docs$content)

# add an id, from 1 to n
docs1$ID <- row.names(docs1)

# split all the words
tidy_docs <- docs1 %>% unnest_tokens(word, phrases)

#create now the cloud: a pair of warnings, because you do not have negative words and it is joining by word(correct)
tidy_docs %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, sentiment, sort = TRUE) %>%
  acast(word ~ sentiment, value.var = "n", fill = 0) %>%
  comparison.cloud(colors = c("gray20", "gray80"),
                   max.words = 100)

考虑这种方法。

library(flipTextAnalysis)
text.to.analyze <- input.phrases

# Converting the text to a vector
text.to.analyze <- as.character(text.to.analyze)

# Extracting the words from the text
library(flipTextAnalysis)
options <- GetTextAnalysisOptions(phrases = '', 
                                 extra.stopwords.text = 'amp',
                                 replacements.text = '',
                                 do.stem = TRUE,
                                 do.spell = TRUE)
text.analysis.setup <- InitializeWordBag(text.to.analyze, min.frequency = 5.0, operations = options$operations, manual.replacements = options$replacement.matrix, stoplist = options$stopwords, alphabetical.sort = FALSE, phrases = options$phrases, print.type = switch("Word Frequencies", "Word Frequencies" = "frequencies", "Transformed Text" = "transformations")) 

# Sentiment analysis of the phrases 
phrase.sentiment = SaveNetSentimentScores(text.to.analyze, check.simple.suffixes = TRUE, blanks.as.missing = TRUE) 
phrase.sentiment[phrase.sentiment >= 1] = 1
phrase.sentiment[phrase.sentiment <= -1] = -1

# Sentiment analysis of the words
td <- as.matrix(AsTermMatrix(text.analysis.setup, min.frequency = 1.0, sparse = TRUE))
counts <- text.analysis.setup$final.counts 
phrase.word.sentiment <- sweep(td, 1, phrase.sentiment, "*")
phrase.word.sentiment[td == 0] <- NA # Setting missing values to Missing
word.mean <- apply(phrase.word.sentiment,2, FUN = mean, na.rm = TRUE)
word.sd <- apply(phrase.word.sentiment,2, FUN = sd, na.rm = TRUE)
word.n <- apply(!is.na(phrase.word.sentiment),2, FUN = sum, na.rm = TRUE)
word.se <- word.sd / sqrt(word.n)
word.z <- word.mean / word.se
word.z[word.n <= 3 || is.na(word.se)] <- 0        
words <- text.analysis.setup$final.tokens
x <- data.frame(word = words, 
      freq = counts, 
      "Sentiment" = word.mean,
      "Z-Score" = word.z,
      Length = nchar(words))
word.data <- x[order(counts, decreasing = TRUE), ]

# Working out the colors
n = nrow(word.data)
colors = rep("grey", n)
colors[word.data$Z.Score < -1.96] = "Red"
colors[word.data$Z.Score > 1.96] =  "Green"

# Creating the word cloud
library(wordcloud2)
wordcloud2(data = word.data[, -3], color = colors, size = 0.4)

我真的不喜欢特朗普,但这很好地说明了这一点。

此外,请参阅下面的两个链接,了解有关如何处理此类问题的更多想法。

http://rstudio-pubs-static.s3.amazonaws.com/71296_3f3ee76e8ef34410a1635926f740c473.html

https://www.analyticsvidhya.com/blog/2017/03/measuring-audience-sentiments-about-movies-using-twitter-and-text-analytics/