使用函数计算分数，然后放入数据框或带有右变量的小标题

Question

我正在开发一个函数，希望对列表中 NRC 词典中的每种情绪进行情绪分析（参见：https://www.tidytextmining.com/sentiment.html#sentiment-analysis-with-inner-join），然后将分数本身作为变量保存在数据框中或 tibble。我已经完成了实际的分析部分，但是将它保存在数据框或 tibble 中是行不通的。

#Creating List of All Emotions To Apply This To
emotion <- c('anger', 'disgust', 'joy', 'surprise', 'anticip', 'fear', 'sadness', 'trust')
#Initialize List with Length of Emotion Vector
wcount <- vector("list", length(emotion))

#Create Tibble for me to Deposit the Result Into
nrc_tib <-tibble(id="", 
                anger=numeric(0), 
                disgust=numeric(0), 
                joy=numeric(0), 
                surprise=numeric(0), 
                anticip=numeric(0), 
                fear=numeric(0), 
                sadness=numeric(0), 
                trust=numeric(0))
#Create Row to Deposit Variable Into
nrc_tib <-add_row(nrc_tib, 'id'="transcript1.txt")

#Defining Function
sentimentanalysis_nrc <- function(emoi) {

  #Getting Sentiment, Filtering by Emotion in List
  nrc_list <- get_sentiments("nrc") %>% 
    filter(sentiment == emoi)

  #Conducting Sentiment Analysis, Saving Results
  wcount[[emoi]] <- wordcount  %>%
    inner_join(nrc_list) %>%
    count(word, sort = TRUE)

    #Calculating Sentiment Score for Given Emotion
    score <- sum(wcount[[emoi]]$n)

    #Saving Emotion in nrc_tib, which is the part that doesn't work
    nrc_tib$emoi <- score
}

#Running the Function
lapply(emotion, FUN = sentimentanalysis_nrc)

我尝试了一些不同的方法，包括将 emoi 放在不起作用的行的括号中，一些谷歌搜索表明这是不允许的。如果我想保存它会允许什么？

注意：如果这对上下文有帮助...此示例使用文件 transcript1.txt，但我的最终目标是将其推广到 transcript2.txt-transcript45.txt，绑定分数之后所有 45 份成绩单。

编辑：我想出了一个笨拙的解决方案，使用：

nrc_tib <<- replace(nrc_tib, emoi, score)

但一定有比这更好的解决方案。

Answer 1

使用整洁数据原则的一大好处是这样的问题变得非常容易处理！您可以使用 joins.

执行此操作

我将使用简·奥斯丁的小说作为示例，因为您没有 post 示例数据。将每本书视为您的成绩单之一。第一步是使用 unnest_tokens().

整理文本数据

library(tidyverse)
library(tidytext)
library(janeaustenr)

tidy_books <- austen_books() %>%
  unnest_tokens(word, text)

tidy_books
#> # A tibble: 725,055 x 2
#>    book                word       
#>    <fct>               <chr>      
#>  1 Sense & Sensibility sense      
#>  2 Sense & Sensibility and        
#>  3 Sense & Sensibility sensibility
#>  4 Sense & Sensibility by         
#>  5 Sense & Sensibility jane       
#>  6 Sense & Sensibility austen     
#>  7 Sense & Sensibility 1811       
#>  8 Sense & Sensibility chapter    
#>  9 Sense & Sensibility 1          
#> 10 Sense & Sensibility the        
#> # … with 725,045 more rows

然后您可以使用 inner_join() 执行情绪分析。请注意，通过此连接，您将成功地将每种情绪与每个单词匹配（在适当的情况下，这些单词不止一次出现在该数据框中）。

tidy_books %>%
  inner_join(get_sentiments("nrc"))
#> Joining, by = "word"
#> # A tibble: 177,363 x 3
#>    book                word        sentiment   
#>    <fct>               <chr>       <chr>       
#>  1 Sense & Sensibility sense       positive    
#>  2 Sense & Sensibility sensibility positive    
#>  3 Sense & Sensibility long        anticipation
#>  4 Sense & Sensibility respectable positive    
#>  5 Sense & Sensibility respectable trust       
#>  6 Sense & Sensibility general     positive    
#>  7 Sense & Sensibility general     trust       
#>  8 Sense & Sensibility good        anticipation
#>  9 Sense & Sensibility good        joy         
#> 10 Sense & Sensibility good        positive    
#> # … with 177,353 more rows

现在您可以 count() 提高每本书（在您的情况下是抄本）和 emotion/affect 的情感分数。

tidy_books %>%
  inner_join(get_sentiments("nrc")) %>%
  count(book, sentiment)
#> Joining, by = "word"
#> # A tibble: 60 x 3
#>    book                sentiment        n
#>    <fct>               <chr>        <int>
#>  1 Sense & Sensibility anger         1343
#>  2 Sense & Sensibility anticipation  3698
#>  3 Sense & Sensibility disgust       1172
#>  4 Sense & Sensibility fear          1861
#>  5 Sense & Sensibility joy           3364
#>  6 Sense & Sensibility negative      4005
#>  7 Sense & Sensibility positive      7429
#>  8 Sense & Sensibility sadness       2064
#>  9 Sense & Sensibility surprise      1589
#> 10 Sense & Sensibility trust         4222
#> # … with 50 more rows

你甚至可以直接用管道来制作情节！

tidy_books %>%
  inner_join(get_sentiments("nrc")) %>%
  count(book, sentiment) %>%
  ggplot(aes(sentiment, n, fill = sentiment)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~book, scales = "free_y") +
  coord_flip()
#> Joining, by = "word"

^{由 reprex package (v0.3.0)}

于 2019-12-13 创建

使用函数计算分数，然后放入数据框或带有右变量的小标题

Using function to calculate a score, then put into a dataframe or tibble with right variable

r

lapply

tidytext

tibble