如何将带有日期时间的数据帧转换为+均值聚合R中的每日时间序列

How to convert dataframe with datetimes to daily time series in + mean aggregation R

我有一个如下所示的数据框:

Arrival_DateTime = c("2009-01-01 08:35:00", "2009-01-01 10:00:00", "2009-01-01 10:25:00",
                     "2009-01-02 07:45:00", "2009-01-02 15:32:00", "2009-01-02 11:15:00",
                     "2009-01-02 12:35:00")
Cust_ID = c("1214", "2643", "31231", "41244", "1214", "15317", "51591")
Wait_Time_Mins = c("54","43","88","94","12","130", "170") 
df_have = data.frame(Arrival_DateTime, Cust_ID, Wait_Time_Mins)

并想对其进行转换,以便我获得每天的客户访问次数以及他们每天的平均等待时间,因此它看起来像这样:

dates = c("2009-01-01", "2009-01-02")
num_visits = c("3", "4")
avg_wait_time = c("61.7","101.5")
df_want = data.frame(dates, num_visits, avg_wait_time)

我该怎么做?

同样,有没有办法进行每月汇总?

你可以使用-

library(dplyr)

df_have %>%
  mutate(Arrival_DateTime = lubridate::ymd_hms(Arrival_DateTime), 
         Date = as.Date(Arrival_DateTime), 
         #For monthly aggregation -
         #Date = format(Arrival_DateTime, '%Y-%m'), 
         Wait_Time_Mins = as.numeric(Wait_Time_Mins)) %>%
  group_by(Date) %>%
  summarise(num_visits = n_distinct(Cust_ID), 
            avg_wait_time = mean(Wait_Time_Mins))

#        Date num_visits avg_wait_time
#1 2009-01-01          3      61.66667
#2 2009-01-02          4     101.50000

使用aggregate().

aggregate(as.double(Wait_Time_Mins) ~ as.Date(Arrival_DateTime), df_have, 
          \(x) c(length(x), mean(x))) |>
  do.call(what=data.frame) |>
  setNames(c('date', 'num_visits', 'avg_wait_time'))
#         date num_visits avg_wait_time
# 1 2009-01-01          3      61.66667
# 2 2009-01-02          4     101.50000

注:R >= 4.1 使用。


数据:

df_have <- structure(list(Arrival_DateTime = c("2009-01-01 08:35:00", "2009-01-01 10:00:00", 
"2009-01-01 10:25:00", "2009-01-02 07:45:00", "2009-01-02 15:32:00", 
"2009-01-02 11:15:00", "2009-01-02 12:35:00"), Cust_ID = c("1214", 
"2643", "31231", "41244", "1214", "15317", "51591"), Wait_Time_Mins = c("54", 
"43", "88", "94", "12", "130", "170")), class = "data.frame", row.names = c(NA, 
-7L))