使用函数计算分数,然后放入数据框或带有右变量的小标题
Using function to calculate a score, then put into a dataframe or tibble with right variable
我正在开发一个函数,希望对列表中 NRC 词典中的每种情绪进行情绪分析(参见:https://www.tidytextmining.com/sentiment.html#sentiment-analysis-with-inner-join),然后将分数本身作为变量保存在数据框中或 tibble。我已经完成了实际的分析部分,但是将它保存在数据框或 tibble 中是行不通的。
#Creating List of All Emotions To Apply This To
emotion <- c('anger', 'disgust', 'joy', 'surprise', 'anticip', 'fear', 'sadness', 'trust')
#Initialize List with Length of Emotion Vector
wcount <- vector("list", length(emotion))
#Create Tibble for me to Deposit the Result Into
nrc_tib <-tibble(id="",
anger=numeric(0),
disgust=numeric(0),
joy=numeric(0),
surprise=numeric(0),
anticip=numeric(0),
fear=numeric(0),
sadness=numeric(0),
trust=numeric(0))
#Create Row to Deposit Variable Into
nrc_tib <-add_row(nrc_tib, 'id'="transcript1.txt")
#Defining Function
sentimentanalysis_nrc <- function(emoi) {
#Getting Sentiment, Filtering by Emotion in List
nrc_list <- get_sentiments("nrc") %>%
filter(sentiment == emoi)
#Conducting Sentiment Analysis, Saving Results
wcount[[emoi]] <- wordcount %>%
inner_join(nrc_list) %>%
count(word, sort = TRUE)
#Calculating Sentiment Score for Given Emotion
score <- sum(wcount[[emoi]]$n)
#Saving Emotion in nrc_tib, which is the part that doesn't work
nrc_tib$emoi <- score
}
#Running the Function
lapply(emotion, FUN = sentimentanalysis_nrc)
我尝试了一些不同的方法,包括将 emoi 放在不起作用的行的括号中,一些谷歌搜索表明这是不允许的。如果我想保存它会允许什么?
注意:如果这对上下文有帮助...此示例使用文件 transcript1.txt,但我的最终目标是将其推广到 transcript2.txt-transcript45.txt,绑定分数之后所有 45 份成绩单。
编辑:我想出了一个笨拙的解决方案,使用:
nrc_tib <<- replace(nrc_tib, emoi, score)
但一定有比这更好的解决方案。
使用整洁数据原则的一大好处是这样的问题变得非常容易处理!您可以使用 joins.
执行此操作
我将使用简·奥斯丁的小说作为示例,因为您没有 post 示例数据。将每本书视为您的成绩单之一。第一步是使用 unnest_tokens()
.
整理文本数据
library(tidyverse)
library(tidytext)
library(janeaustenr)
tidy_books <- austen_books() %>%
unnest_tokens(word, text)
tidy_books
#> # A tibble: 725,055 x 2
#> book word
#> <fct> <chr>
#> 1 Sense & Sensibility sense
#> 2 Sense & Sensibility and
#> 3 Sense & Sensibility sensibility
#> 4 Sense & Sensibility by
#> 5 Sense & Sensibility jane
#> 6 Sense & Sensibility austen
#> 7 Sense & Sensibility 1811
#> 8 Sense & Sensibility chapter
#> 9 Sense & Sensibility 1
#> 10 Sense & Sensibility the
#> # … with 725,045 more rows
然后您可以使用 inner_join()
执行情绪分析。请注意,通过此连接,您将成功地将每种情绪与每个单词匹配(在适当的情况下,这些单词不止一次出现在该数据框中)。
tidy_books %>%
inner_join(get_sentiments("nrc"))
#> Joining, by = "word"
#> # A tibble: 177,363 x 3
#> book word sentiment
#> <fct> <chr> <chr>
#> 1 Sense & Sensibility sense positive
#> 2 Sense & Sensibility sensibility positive
#> 3 Sense & Sensibility long anticipation
#> 4 Sense & Sensibility respectable positive
#> 5 Sense & Sensibility respectable trust
#> 6 Sense & Sensibility general positive
#> 7 Sense & Sensibility general trust
#> 8 Sense & Sensibility good anticipation
#> 9 Sense & Sensibility good joy
#> 10 Sense & Sensibility good positive
#> # … with 177,353 more rows
现在您可以 count()
提高每本书(在您的情况下是抄本)和 emotion/affect 的情感分数。
tidy_books %>%
inner_join(get_sentiments("nrc")) %>%
count(book, sentiment)
#> Joining, by = "word"
#> # A tibble: 60 x 3
#> book sentiment n
#> <fct> <chr> <int>
#> 1 Sense & Sensibility anger 1343
#> 2 Sense & Sensibility anticipation 3698
#> 3 Sense & Sensibility disgust 1172
#> 4 Sense & Sensibility fear 1861
#> 5 Sense & Sensibility joy 3364
#> 6 Sense & Sensibility negative 4005
#> 7 Sense & Sensibility positive 7429
#> 8 Sense & Sensibility sadness 2064
#> 9 Sense & Sensibility surprise 1589
#> 10 Sense & Sensibility trust 4222
#> # … with 50 more rows
你甚至可以直接用管道来制作情节!
tidy_books %>%
inner_join(get_sentiments("nrc")) %>%
count(book, sentiment) %>%
ggplot(aes(sentiment, n, fill = sentiment)) +
geom_col(show.legend = FALSE) +
facet_wrap(~book, scales = "free_y") +
coord_flip()
#> Joining, by = "word"
由 reprex package (v0.3.0)
于 2019-12-13 创建
我正在开发一个函数,希望对列表中 NRC 词典中的每种情绪进行情绪分析(参见:https://www.tidytextmining.com/sentiment.html#sentiment-analysis-with-inner-join),然后将分数本身作为变量保存在数据框中或 tibble。我已经完成了实际的分析部分,但是将它保存在数据框或 tibble 中是行不通的。
#Creating List of All Emotions To Apply This To
emotion <- c('anger', 'disgust', 'joy', 'surprise', 'anticip', 'fear', 'sadness', 'trust')
#Initialize List with Length of Emotion Vector
wcount <- vector("list", length(emotion))
#Create Tibble for me to Deposit the Result Into
nrc_tib <-tibble(id="",
anger=numeric(0),
disgust=numeric(0),
joy=numeric(0),
surprise=numeric(0),
anticip=numeric(0),
fear=numeric(0),
sadness=numeric(0),
trust=numeric(0))
#Create Row to Deposit Variable Into
nrc_tib <-add_row(nrc_tib, 'id'="transcript1.txt")
#Defining Function
sentimentanalysis_nrc <- function(emoi) {
#Getting Sentiment, Filtering by Emotion in List
nrc_list <- get_sentiments("nrc") %>%
filter(sentiment == emoi)
#Conducting Sentiment Analysis, Saving Results
wcount[[emoi]] <- wordcount %>%
inner_join(nrc_list) %>%
count(word, sort = TRUE)
#Calculating Sentiment Score for Given Emotion
score <- sum(wcount[[emoi]]$n)
#Saving Emotion in nrc_tib, which is the part that doesn't work
nrc_tib$emoi <- score
}
#Running the Function
lapply(emotion, FUN = sentimentanalysis_nrc)
我尝试了一些不同的方法,包括将 emoi 放在不起作用的行的括号中,一些谷歌搜索表明这是不允许的。如果我想保存它会允许什么?
注意:如果这对上下文有帮助...此示例使用文件 transcript1.txt,但我的最终目标是将其推广到 transcript2.txt-transcript45.txt,绑定分数之后所有 45 份成绩单。
编辑:我想出了一个笨拙的解决方案,使用:
nrc_tib <<- replace(nrc_tib, emoi, score)
但一定有比这更好的解决方案。
使用整洁数据原则的一大好处是这样的问题变得非常容易处理!您可以使用 joins.
执行此操作我将使用简·奥斯丁的小说作为示例,因为您没有 post 示例数据。将每本书视为您的成绩单之一。第一步是使用 unnest_tokens()
.
library(tidyverse)
library(tidytext)
library(janeaustenr)
tidy_books <- austen_books() %>%
unnest_tokens(word, text)
tidy_books
#> # A tibble: 725,055 x 2
#> book word
#> <fct> <chr>
#> 1 Sense & Sensibility sense
#> 2 Sense & Sensibility and
#> 3 Sense & Sensibility sensibility
#> 4 Sense & Sensibility by
#> 5 Sense & Sensibility jane
#> 6 Sense & Sensibility austen
#> 7 Sense & Sensibility 1811
#> 8 Sense & Sensibility chapter
#> 9 Sense & Sensibility 1
#> 10 Sense & Sensibility the
#> # … with 725,045 more rows
然后您可以使用 inner_join()
执行情绪分析。请注意,通过此连接,您将成功地将每种情绪与每个单词匹配(在适当的情况下,这些单词不止一次出现在该数据框中)。
tidy_books %>%
inner_join(get_sentiments("nrc"))
#> Joining, by = "word"
#> # A tibble: 177,363 x 3
#> book word sentiment
#> <fct> <chr> <chr>
#> 1 Sense & Sensibility sense positive
#> 2 Sense & Sensibility sensibility positive
#> 3 Sense & Sensibility long anticipation
#> 4 Sense & Sensibility respectable positive
#> 5 Sense & Sensibility respectable trust
#> 6 Sense & Sensibility general positive
#> 7 Sense & Sensibility general trust
#> 8 Sense & Sensibility good anticipation
#> 9 Sense & Sensibility good joy
#> 10 Sense & Sensibility good positive
#> # … with 177,353 more rows
现在您可以 count()
提高每本书(在您的情况下是抄本)和 emotion/affect 的情感分数。
tidy_books %>%
inner_join(get_sentiments("nrc")) %>%
count(book, sentiment)
#> Joining, by = "word"
#> # A tibble: 60 x 3
#> book sentiment n
#> <fct> <chr> <int>
#> 1 Sense & Sensibility anger 1343
#> 2 Sense & Sensibility anticipation 3698
#> 3 Sense & Sensibility disgust 1172
#> 4 Sense & Sensibility fear 1861
#> 5 Sense & Sensibility joy 3364
#> 6 Sense & Sensibility negative 4005
#> 7 Sense & Sensibility positive 7429
#> 8 Sense & Sensibility sadness 2064
#> 9 Sense & Sensibility surprise 1589
#> 10 Sense & Sensibility trust 4222
#> # … with 50 more rows
你甚至可以直接用管道来制作情节!
tidy_books %>%
inner_join(get_sentiments("nrc")) %>%
count(book, sentiment) %>%
ggplot(aes(sentiment, n, fill = sentiment)) +
geom_col(show.legend = FALSE) +
facet_wrap(~book, scales = "free_y") +
coord_flip()
#> Joining, by = "word"
由 reprex package (v0.3.0)
于 2019-12-13 创建