为数据框的每一行提取情绪计算
Extract emotions calculation for every row of a dataframe
我有一个包含文本行的数据框。我想为每一行文本提取一个特定情感的向量,它将是一个二进制 0 不存在这种情感或 1 存在。
它们总共有 5 种情感,但我只想拥有 1对于似乎最多的情感。
我尝试过的示例:
library(tidytext)
text = data.frame(id = c(11,12,13), text=c("bad movie","good movie","I think it would benefit religious people to see things like this, not just to learn about our home, the Universe, in a fun and easy way, but also to understand that non- religious explanations don't leave people hopeless and",))
nrc_lexicon <- get_sentiments("nrc")
预期输出示例:
id text sadness anger joy love neutral
11 "bad movie" 1 0 0 0 0
12 "good movie" 0 0 1 0 0
任何提示都会对我有所帮助。
为每一行制作下一步是什么的示例?
我如何用 nrc 词典分析调用每一行?
for (i in 1:nrow(text)) {
(text$text[i], nrc_lexicon)
}
这个怎么样:
library(tidytext) # library for text
library(dplyr)
# your data
text <- data.frame(id = c(11,12,13),
text=c("bad movie","good movie","I think it would benefit religious
people to see things like this, not just to learn about our home,
the Universe, in a fun and easy way, but also to understand that non- religious
explanations don't leave people hopeless and"), stringsAsFactors = FALSE) # here put this option, stringAsFactors = FALSE!
# the lexicon
nrc_lexicon <- get_sentiments("nrc")
# now the job
unnested <- text %>%
unnest_tokens(word, text) %>% # unnest the words
left_join(nrc_lexicon) %>% # join with the lexicon to have sentiments
left_join(text) # join with your data to have titles
这里是带id
的输出,你也可以带标题,但由于第三个标题太长,我没有放,你可以很容易地把它放成unnested$text
共 unnested$id
个:
table_sentiment <- table(unnested$id, unnested$sentiment)
table_sentiment
anger anticipation disgust fear joy negative positive sadness surprise trust
11 1 0 1 1 0 1 0 1 0 0
12 0 1 0 0 1 0 1 0 1 1
13 0 1 0 1 1 2 3 2 1 0
如果你想要它 data.frame
:
df_sentiment <- as.data.frame.matrix(table_sentiment)
现在你可以做任何你想做的事,例如,如果我没记错的话,你想要一个二进制输出是否存在一个情绪:
df_sentiment[df_sentiment>1]<-1
df_sentiment
anger anticipation disgust fear joy negative positive sadness surprise trust
11 1 0 1 1 0 1 0 1 0 0
12 0 1 0 0 1 0 1 0 1 1
13 0 1 0 1 1 1 1 1 1 0
我有一个包含文本行的数据框。我想为每一行文本提取一个特定情感的向量,它将是一个二进制 0 不存在这种情感或 1 存在。
它们总共有 5 种情感,但我只想拥有 1对于似乎最多的情感。
我尝试过的示例:
library(tidytext)
text = data.frame(id = c(11,12,13), text=c("bad movie","good movie","I think it would benefit religious people to see things like this, not just to learn about our home, the Universe, in a fun and easy way, but also to understand that non- religious explanations don't leave people hopeless and",))
nrc_lexicon <- get_sentiments("nrc")
预期输出示例:
id text sadness anger joy love neutral
11 "bad movie" 1 0 0 0 0
12 "good movie" 0 0 1 0 0
任何提示都会对我有所帮助。
为每一行制作下一步是什么的示例?
我如何用 nrc 词典分析调用每一行?
for (i in 1:nrow(text)) {
(text$text[i], nrc_lexicon)
}
这个怎么样:
library(tidytext) # library for text
library(dplyr)
# your data
text <- data.frame(id = c(11,12,13),
text=c("bad movie","good movie","I think it would benefit religious
people to see things like this, not just to learn about our home,
the Universe, in a fun and easy way, but also to understand that non- religious
explanations don't leave people hopeless and"), stringsAsFactors = FALSE) # here put this option, stringAsFactors = FALSE!
# the lexicon
nrc_lexicon <- get_sentiments("nrc")
# now the job
unnested <- text %>%
unnest_tokens(word, text) %>% # unnest the words
left_join(nrc_lexicon) %>% # join with the lexicon to have sentiments
left_join(text) # join with your data to have titles
这里是带id
的输出,你也可以带标题,但由于第三个标题太长,我没有放,你可以很容易地把它放成unnested$text
共 unnested$id
个:
table_sentiment <- table(unnested$id, unnested$sentiment)
table_sentiment
anger anticipation disgust fear joy negative positive sadness surprise trust
11 1 0 1 1 0 1 0 1 0 0
12 0 1 0 0 1 0 1 0 1 1
13 0 1 0 1 1 2 3 2 1 0
如果你想要它 data.frame
:
df_sentiment <- as.data.frame.matrix(table_sentiment)
现在你可以做任何你想做的事,例如,如果我没记错的话,你想要一个二进制输出是否存在一个情绪:
df_sentiment[df_sentiment>1]<-1
df_sentiment
anger anticipation disgust fear joy negative positive sadness surprise trust
11 1 0 1 1 0 1 0 1 0 0
12 0 1 0 0 1 0 1 0 1 1
13 0 1 0 1 1 1 1 1 1 0