删除R中情感词典中的单词

Delete words in sentiment lexicon in R

我正在使用 nrc、bing 和 afinn 词典在 R 中进行情感分析。

现在我想从这些词典中删除一些特定的词,但我不知道该怎么做,因为词典没有保存在我的环境中。

我的代码如下所示(以 nrc 为例):

MyTextFile %>%
  inner_join(get_sentiments("nrc")) %>%
  count(sentiment, sort = TRUE)

这里有两种方法(毫无疑问还有更多)。首先注意 nrc 词典中有 13901 个单词:

> library(tidytext)
> library(dplyr)
> sentiments <- get_sentiments("nrc")
> sentiments
# A tibble: 13,901 x 2
   word        sentiment
   <chr>       <chr>    
 1 abacus      trust    
 2 abandon     fear     
 3 abandon     negative 
 4 abandon     sadness 
 5 abandoned   anger    
 6 abandoned   fear    
... and so on

您可以过滤掉特定情感类别中的所有词(剩下的词较少,在 12425):

> sentiments <- get_sentiments("nrc") %>% filter(sentiment!="fear")
> sentiments
# A tibble: 12,425 x 2 
   word        sentiment
   <chr>       <chr>    
 1 abacus      trust    
 2 abandon     negative 
 3 abandon     sadness  
 4 abandoned   anger    
 5 abandoned   negative 
 6 abandoned   sadness  

或者您可以创建自己的 dropwords 列表并将它们从词典中删除(剩下的单词较少,在 13884):

> dropwords <- c("abandon","abandoned","abandonment","abduction","aberrant")
> sentiments <- get_sentiments("nrc") %>% filter(!word %in% dropwords)
> sentiments
# A tibble: 13,884 x 2
   word       sentiment
   <chr>      <chr>    
 1 abacus     trust    
 2 abba       positive 
 3 abbot      trust    
 4 aberration disgust  
 5 aberration negative 
 6 abhor      anger    

然后您只需使用您创建的 sentiments 进行情绪分析:

> library(gutenbergr)
> hgwells <- gutenberg_download(35) # loads "The Time Machine"
> hgwells %>% unnest_tokens(word,text) %>% 
      inner_join(sentiments) %>% count(word,sort=TRUE)
Joining, by = "word"
# A tibble: 1,077 x 2
   word         n
   <chr>    <int>
 1 white      236
 2 feeling    200
 3 time       200
 4 sun        145
 5 found      132
 6 darkness   108

希望这能有所帮助。

如果您可以制作要删除的单词的数据框,则可以使用 anti_join:

排除这些单词
word_list <- c("words","to","remove")
words_to_remove <- data.frame(words=word_list)

MyTextFile %>%
  inner_join(get_sentiments("nrc")) %>%
  anti_join(words_to_remove) %>%
  count(sentiment, sort = TRUE)