如何将带有日期时间的数据帧转换为+均值聚合R中的每日时间序列
How to convert dataframe with datetimes to daily time series in + mean aggregation R
我有一个如下所示的数据框:
Arrival_DateTime = c("2009-01-01 08:35:00", "2009-01-01 10:00:00", "2009-01-01 10:25:00",
"2009-01-02 07:45:00", "2009-01-02 15:32:00", "2009-01-02 11:15:00",
"2009-01-02 12:35:00")
Cust_ID = c("1214", "2643", "31231", "41244", "1214", "15317", "51591")
Wait_Time_Mins = c("54","43","88","94","12","130", "170")
df_have = data.frame(Arrival_DateTime, Cust_ID, Wait_Time_Mins)
并想对其进行转换,以便我获得每天的客户访问次数以及他们每天的平均等待时间,因此它看起来像这样:
dates = c("2009-01-01", "2009-01-02")
num_visits = c("3", "4")
avg_wait_time = c("61.7","101.5")
df_want = data.frame(dates, num_visits, avg_wait_time)
我该怎么做?
同样,有没有办法进行每月汇总?
你可以使用-
library(dplyr)
df_have %>%
mutate(Arrival_DateTime = lubridate::ymd_hms(Arrival_DateTime),
Date = as.Date(Arrival_DateTime),
#For monthly aggregation -
#Date = format(Arrival_DateTime, '%Y-%m'),
Wait_Time_Mins = as.numeric(Wait_Time_Mins)) %>%
group_by(Date) %>%
summarise(num_visits = n_distinct(Cust_ID),
avg_wait_time = mean(Wait_Time_Mins))
# Date num_visits avg_wait_time
#1 2009-01-01 3 61.66667
#2 2009-01-02 4 101.50000
使用aggregate()
.
aggregate(as.double(Wait_Time_Mins) ~ as.Date(Arrival_DateTime), df_have,
\(x) c(length(x), mean(x))) |>
do.call(what=data.frame) |>
setNames(c('date', 'num_visits', 'avg_wait_time'))
# date num_visits avg_wait_time
# 1 2009-01-01 3 61.66667
# 2 2009-01-02 4 101.50000
注:R >= 4.1 使用。
数据:
df_have <- structure(list(Arrival_DateTime = c("2009-01-01 08:35:00", "2009-01-01 10:00:00",
"2009-01-01 10:25:00", "2009-01-02 07:45:00", "2009-01-02 15:32:00",
"2009-01-02 11:15:00", "2009-01-02 12:35:00"), Cust_ID = c("1214",
"2643", "31231", "41244", "1214", "15317", "51591"), Wait_Time_Mins = c("54",
"43", "88", "94", "12", "130", "170")), class = "data.frame", row.names = c(NA,
-7L))
我有一个如下所示的数据框:
Arrival_DateTime = c("2009-01-01 08:35:00", "2009-01-01 10:00:00", "2009-01-01 10:25:00",
"2009-01-02 07:45:00", "2009-01-02 15:32:00", "2009-01-02 11:15:00",
"2009-01-02 12:35:00")
Cust_ID = c("1214", "2643", "31231", "41244", "1214", "15317", "51591")
Wait_Time_Mins = c("54","43","88","94","12","130", "170")
df_have = data.frame(Arrival_DateTime, Cust_ID, Wait_Time_Mins)
并想对其进行转换,以便我获得每天的客户访问次数以及他们每天的平均等待时间,因此它看起来像这样:
dates = c("2009-01-01", "2009-01-02")
num_visits = c("3", "4")
avg_wait_time = c("61.7","101.5")
df_want = data.frame(dates, num_visits, avg_wait_time)
我该怎么做?
同样,有没有办法进行每月汇总?
你可以使用-
library(dplyr)
df_have %>%
mutate(Arrival_DateTime = lubridate::ymd_hms(Arrival_DateTime),
Date = as.Date(Arrival_DateTime),
#For monthly aggregation -
#Date = format(Arrival_DateTime, '%Y-%m'),
Wait_Time_Mins = as.numeric(Wait_Time_Mins)) %>%
group_by(Date) %>%
summarise(num_visits = n_distinct(Cust_ID),
avg_wait_time = mean(Wait_Time_Mins))
# Date num_visits avg_wait_time
#1 2009-01-01 3 61.66667
#2 2009-01-02 4 101.50000
使用aggregate()
.
aggregate(as.double(Wait_Time_Mins) ~ as.Date(Arrival_DateTime), df_have,
\(x) c(length(x), mean(x))) |>
do.call(what=data.frame) |>
setNames(c('date', 'num_visits', 'avg_wait_time'))
# date num_visits avg_wait_time
# 1 2009-01-01 3 61.66667
# 2 2009-01-02 4 101.50000
注:R >= 4.1 使用。
数据:
df_have <- structure(list(Arrival_DateTime = c("2009-01-01 08:35:00", "2009-01-01 10:00:00",
"2009-01-01 10:25:00", "2009-01-02 07:45:00", "2009-01-02 15:32:00",
"2009-01-02 11:15:00", "2009-01-02 12:35:00"), Cust_ID = c("1214",
"2643", "31231", "41244", "1214", "15317", "51591"), Wait_Time_Mins = c("54",
"43", "88", "94", "12", "130", "170")), class = "data.frame", row.names = c(NA,
-7L))