使用 dplyr 进行聚合和均值计算

Aggregation and mean calculation with dplyr

我有一个 chunk of code 聚合大型数据集的时间戳(见下文)。每个时间戳代表一条推文。该代码每周汇总推文,效果很好。现在,我还有一列包含每条推文的情绪值。我想知道是否可以计算每周推文的平均情绪。最后最好有一个数据集,其中包含每周的推文数量和这些聚合推文的平均情绪。如果您有任何提示,请告诉我:)

亲切的问候, 丹尼尔

weekly_counts_2 <- df_bw %>% 
  drop_na(Timestamp) %>%             
  mutate(weekly_cases = floor_date(   
    Timestamp,
    unit = "week")) %>%            
  count(weekly_cases) %>%
  tidyr::complete(                
    weekly_cases = seq.Date(          
      from = min(weekly_cases),      
      to = max(weekly_cases),         
      by = "week"),                   
    fill = list(n = 0))

由于没有共享数据,因此很难验证答案,但根据此处提供的描述,您可以尝试一个解决方案。

library(dplyr)
library(tidyr)
library(lubridate)

weekly_counts_2 <- df_bw %>% 
  drop_na(Timestamp) %>%             
  mutate(weekly_cases = floor_date(Timestamp,unit = "week")) %>% 
  group_by(weekly_cases) %>%
  summarise(mean_sentiment = mean(sentiment_value, na.rm = TRUE),
            count = n()) %>%
  complete(weekly_cases = seq.Date(min(weekly_cases), 
              max(weekly_cases),by = "week"), fill = list(n = 0))

我假定具有情绪值的列称为 sentiment_value,请根据您的数据对其进行相应更改。