推特数据情感分析
Twitter Data Sentiment Analysis
我是新手,如果我的问题很琐碎,我深表歉意。
我正在尝试对我下载但遇到问题的一些推特数据进行情绪分析。我正在尝试遵循此示例:
创建一个显示 positive/negative 情绪计数的条形图。该示例的代码在这里**
original_books %>%
unnest_tokens(output = word,input = text) %>%
inner_join(get_sentiments("bing")) %>%
count(book, index, sentiment) %>%
pivot_wider(names_from = sentiment,
values_from = n) %>%
mutate(sent_score = positive - negative) %>%
ggplot() +
geom_col(aes(x = index, y = sent_score,
fill = book),
show.legend = F) +
facet_wrap(~book,scales = "free_x")
这是我目前用于自己分析的代码:
#twitter scraping
ref <- search_tweets(
"#refugee", n = 18000, include_rts = FALSE,lang = "en"
)
data(stop_words)
new_stops <- tibble(word = c("https", "t.co", "1", "refugee", "#refugee", "amp", "refugees",
"day", "2022", "dont", "0", "2", "@refugees", "4", "2021") ,lexicon = "sabs")
full_stop <- stop_words %>%
bind_rows(new_stops) #bind_rows adds more rows (way to merge data)
现在我想制作一个与上面类似的条形图,但出现错误,因为我没有名为“索引”的列。我试着做了一个,但没有成功。这是我尝试使用的代码:
ref %>%
unnest_tokens(word,text,token = "tweets") %>%
anti_join(full_stop) %>%
inner_join(get_sentiments("bing")) %>%
count(word, index, sentiment) %>%
pivot_wider(names_from = sentiment,
values_from = n) %>%
mutate(sent_score = positive - negative) %>%
ggplot() + #plot the overall sentiment (pos - neg) versus index,
geom_col(aes(x = index, y = sent_score), show.legend = F)
这是错误的图片
非常感谢任何建议!谢谢
ref 的内容
enter image description here
enter image description here
在示例中,index
仅指书中的一组行,顺序为(即 1、2、3...)。这是一种对文本进行分组的方法——您可以将它想象成一个页面,它也是按数字顺序排列的。文本只是被分成某种类型的组,以便计算每个组内的情绪。推文是自然的词组,您想要计算单个推文中的情绪——您不需要将其进一步拆分。在该示例中,图中的每个“页面”都有一个条形图。每条推文都有一个栏。您需要为推文分配连续的编号,因为它们没有自然顺序。我在下面使用 rowid_to_column()
进行了此操作,并将新列命名为“tweet”。它只包含推文的行号,所以一旦 ref
数据帧按单词拆分,每个单词仍然与该编号所属的原始推文相关联。
请注意,许多推文没有足够的带有相关情感的词来计算它们的情感分数,所以我 re-assigned 一个连续的数字给那些有的 - 这个被称为“索引” .
我还在 pivot_wider()
行中添加了参数 values_fill = 0
因为只有正面(或负面)情绪的推文没有被包括在内,因为另一个值是 NA 而不是 0。
一路上有几个地方我只是停下来查看数据——这对理解错误很有帮助。
library(tidyverse)
library(rtweet)
library(tidytext)
#twitter scraping
ref <- search_tweets(
"#refugee", n = 18000, include_rts = FALSE,lang = "en"
)
data(stop_words)
new_stops <- tibble(word = c("https", "t.co", "1", "refugee", "#refugee", "amp", "refugees",
"day", "2022", "dont", "0", "2", "@refugees", "4", "2021") ,lexicon = "sabs")
full_stop <- stop_words %>%
bind_rows(new_stops) #bind_rows adds more rows (way to merge data)
ref_w_sentiments <- ref %>%
rowid_to_column("tweet") %>%
unnest_tokens(word, text, token = "tweets") %>%
anti_join(full_stop) %>%
inner_join(get_sentiments("bing"))
# look at what the data looks like
select(ref_w_sentiments, tweet, word, sentiment)
#> # A tibble: 811 × 3
#> tweet word sentiment
#> <int> <chr> <chr>
#> 1 2 helping positive
#> 2 3 inspiring positive
#> 3 4 support positive
ref_w_scores <- ref_w_sentiments %>%
group_by(tweet) %>%
count(sentiment) %>%
pivot_wider(names_from = sentiment,
values_from = n, values_fill = 0) %>%
mutate(sent_score = positive - negative) %>%
# not all tweets were scored, so create a new index
rowid_to_column("index")
# look at the data again
ref_w_scores
#> # A tibble: 418 × 5
#> # Groups: tweet [418]
#> index tweet positive negative sent_score
#> <int> <int> <int> <int> <int>
#> 1 1 2 1 0 1
#> 2 2 3 1 0 1
#> 3 3 4 1 0 1
ggplot(ref_w_scores) + #plot the overall sentiment (pos - neg) versus index,
geom_col(aes(x = index, y = sent_score), show.legend = F)
我是新手,如果我的问题很琐碎,我深表歉意。 我正在尝试对我下载但遇到问题的一些推特数据进行情绪分析。我正在尝试遵循此示例:
创建一个显示 positive/negative 情绪计数的条形图。该示例的代码在这里**
original_books %>%
unnest_tokens(output = word,input = text) %>%
inner_join(get_sentiments("bing")) %>%
count(book, index, sentiment) %>%
pivot_wider(names_from = sentiment,
values_from = n) %>%
mutate(sent_score = positive - negative) %>%
ggplot() +
geom_col(aes(x = index, y = sent_score,
fill = book),
show.legend = F) +
facet_wrap(~book,scales = "free_x")
这是我目前用于自己分析的代码:
#twitter scraping
ref <- search_tweets(
"#refugee", n = 18000, include_rts = FALSE,lang = "en"
)
data(stop_words)
new_stops <- tibble(word = c("https", "t.co", "1", "refugee", "#refugee", "amp", "refugees",
"day", "2022", "dont", "0", "2", "@refugees", "4", "2021") ,lexicon = "sabs")
full_stop <- stop_words %>%
bind_rows(new_stops) #bind_rows adds more rows (way to merge data)
现在我想制作一个与上面类似的条形图,但出现错误,因为我没有名为“索引”的列。我试着做了一个,但没有成功。这是我尝试使用的代码:
ref %>%
unnest_tokens(word,text,token = "tweets") %>%
anti_join(full_stop) %>%
inner_join(get_sentiments("bing")) %>%
count(word, index, sentiment) %>%
pivot_wider(names_from = sentiment,
values_from = n) %>%
mutate(sent_score = positive - negative) %>%
ggplot() + #plot the overall sentiment (pos - neg) versus index,
geom_col(aes(x = index, y = sent_score), show.legend = F)
这是错误的图片
非常感谢任何建议!谢谢
ref 的内容 enter image description here enter image description here
在示例中,index
仅指书中的一组行,顺序为(即 1、2、3...)。这是一种对文本进行分组的方法——您可以将它想象成一个页面,它也是按数字顺序排列的。文本只是被分成某种类型的组,以便计算每个组内的情绪。推文是自然的词组,您想要计算单个推文中的情绪——您不需要将其进一步拆分。在该示例中,图中的每个“页面”都有一个条形图。每条推文都有一个栏。您需要为推文分配连续的编号,因为它们没有自然顺序。我在下面使用 rowid_to_column()
进行了此操作,并将新列命名为“tweet”。它只包含推文的行号,所以一旦 ref
数据帧按单词拆分,每个单词仍然与该编号所属的原始推文相关联。
请注意,许多推文没有足够的带有相关情感的词来计算它们的情感分数,所以我 re-assigned 一个连续的数字给那些有的 - 这个被称为“索引” .
我还在 pivot_wider()
行中添加了参数 values_fill = 0
因为只有正面(或负面)情绪的推文没有被包括在内,因为另一个值是 NA 而不是 0。
一路上有几个地方我只是停下来查看数据——这对理解错误很有帮助。
library(tidyverse)
library(rtweet)
library(tidytext)
#twitter scraping
ref <- search_tweets(
"#refugee", n = 18000, include_rts = FALSE,lang = "en"
)
data(stop_words)
new_stops <- tibble(word = c("https", "t.co", "1", "refugee", "#refugee", "amp", "refugees",
"day", "2022", "dont", "0", "2", "@refugees", "4", "2021") ,lexicon = "sabs")
full_stop <- stop_words %>%
bind_rows(new_stops) #bind_rows adds more rows (way to merge data)
ref_w_sentiments <- ref %>%
rowid_to_column("tweet") %>%
unnest_tokens(word, text, token = "tweets") %>%
anti_join(full_stop) %>%
inner_join(get_sentiments("bing"))
# look at what the data looks like
select(ref_w_sentiments, tweet, word, sentiment)
#> # A tibble: 811 × 3
#> tweet word sentiment
#> <int> <chr> <chr>
#> 1 2 helping positive
#> 2 3 inspiring positive
#> 3 4 support positive
ref_w_scores <- ref_w_sentiments %>%
group_by(tweet) %>%
count(sentiment) %>%
pivot_wider(names_from = sentiment,
values_from = n, values_fill = 0) %>%
mutate(sent_score = positive - negative) %>%
# not all tweets were scored, so create a new index
rowid_to_column("index")
# look at the data again
ref_w_scores
#> # A tibble: 418 × 5
#> # Groups: tweet [418]
#> index tweet positive negative sent_score
#> <int> <int> <int> <int> <int>
#> 1 1 2 1 0 1
#> 2 2 3 1 0 1
#> 3 3 4 1 0 1
ggplot(ref_w_scores) + #plot the overall sentiment (pos - neg) versus index,
geom_col(aes(x = index, y = sent_score), show.legend = F)