使用 dplyr 进行聚合和均值计算
Aggregation and mean calculation with dplyr
我有一个 chunk of code 聚合大型数据集的时间戳(见下文)。每个时间戳代表一条推文。该代码每周汇总推文,效果很好。现在,我还有一列包含每条推文的情绪值。我想知道是否可以计算每周推文的平均情绪。最后最好有一个数据集,其中包含每周的推文数量和这些聚合推文的平均情绪。如果您有任何提示,请告诉我:)
亲切的问候, 丹尼尔
weekly_counts_2 <- df_bw %>%
drop_na(Timestamp) %>%
mutate(weekly_cases = floor_date(
Timestamp,
unit = "week")) %>%
count(weekly_cases) %>%
tidyr::complete(
weekly_cases = seq.Date(
from = min(weekly_cases),
to = max(weekly_cases),
by = "week"),
fill = list(n = 0))
由于没有共享数据,因此很难验证答案,但根据此处提供的描述,您可以尝试一个解决方案。
library(dplyr)
library(tidyr)
library(lubridate)
weekly_counts_2 <- df_bw %>%
drop_na(Timestamp) %>%
mutate(weekly_cases = floor_date(Timestamp,unit = "week")) %>%
group_by(weekly_cases) %>%
summarise(mean_sentiment = mean(sentiment_value, na.rm = TRUE),
count = n()) %>%
complete(weekly_cases = seq.Date(min(weekly_cases),
max(weekly_cases),by = "week"), fill = list(n = 0))
我假定具有情绪值的列称为 sentiment_value
,请根据您的数据对其进行相应更改。