在 R 中绘制 "Popular times" 图
Plotting a "Popular times" graph in R
我想创建一个图表来显示我工作的旅馆的繁忙时间。理想情况下,我将能够创建两条密度曲线(或条形),一条用于常规工作日,一条用于周末和假期。我有我们所有客户的入住和退房时间,仅此而已。
我想象得到类似于这种类型的图表(不过,请记住这里的人会过夜)。
Popular times - Google
关于如何解决这个问题有什么想法吗?
首先,让我们生成一些 check-in/check-out 数据:
library(tidyverse)
library(lubridate)
nclients <- 40
startinterval <- '2021/11/04'
endinterval <- '2021/11/07'
set.seed(1236)
data <- tibble(client = 1:nclients,
checkin = ymd_hms(sample(seq(as.POSIXct(startinterval),
as.POSIXct(endinterval),
by="sec"),
nclients,
replace = TRUE))
)
data <- data %>%
mutate(inte = floor(sample(x = 1800:28800, size = nrow(data), replace = TRUE)),
checkout = checkin + inte) %>%
select(-inte)
head(data)
# # A tibble: 6 x 3
# client checkin checkout
# <int> <dttm> <dttm>
# 1 1 2021-11-06 23:21:22 2021-11-07 06:14:03
# 2 2 2021-11-04 19:22:20 2021-11-04 22:46:54
# 3 3 2021-11-06 21:44:56 2021-11-07 04:22:11
# 4 4 2021-11-05 04:32:33 2021-11-05 09:32:05
# 5 5 2021-11-05 13:27:55 2021-11-05 15:34:22
# 6 6 2021-11-04 15:31:23 2021-11-04 22:41:26
然后我们需要一个函数将该数据转换为指定客户注册时间(即登记入住和退房之间)的数据。修改Bas的部分答案,得到:
whathours <- function(start_time, end_time) {
time_interval <- interval(start_time, end_time)
start_hour <- floor_date(start_time, unit = "hour")
end_hour <- ceiling_date(end_time, unit = "hour")
diff_hours <- as.double(difftime(end_hour, start_hour, "hours"))
hours <- start_hour + hours(0:diff_hours)
hour_intervals <- int_diff(hours)
hours <- hours[1:(length(hours)-1)]
tibble(Day = date(hours),
HourOfDay = hour(hours))
}
映射该函数,然后我们生成一个数据集,该数据集将每小时在旅馆中注册的客户数量分组,并根据要求按日期类型分组:
data2 <- data %>%
mutate(start_time = as_datetime(checkin),
end_time = as_datetime(checkout)) %>%
as_tibble() %>%
mutate(infoperhour = purrr::map2(start_time, end_time, whathours)) %>%
unnest(infoperhour) %>%
group_by(Day, HourOfDay) %>%
summarise(day = wday(Day),
typeofday = ifelse(day %in% c(1:5), "weekday", "weekend")) %>%
group_by(typeofday) %>% count(HourOfDay, sort = TRUE) %>%
ungroup() %>%
arrange(typeofday, HourOfDay)
head(data2)
# # A tibble: 6 x 3
# typeofday HourOfDay n
# <chr> <int> <int>
# 1 weekday 0 3
# 2 weekday 1 3
# 3 weekday 2 5
# 4 weekday 3 5
# 5 weekday 4 5
# 6 weekday 5 4
最后,我们绘制数据:
data2 %>%
ggplot(.,aes(x = HourOfDay, color = typeofday))+
geom_density()+
scale_x_continuous(limits = c(0, 23), breaks = seq(0, 23, by = 1))+
theme_classic()
Final plot
我想创建一个图表来显示我工作的旅馆的繁忙时间。理想情况下,我将能够创建两条密度曲线(或条形),一条用于常规工作日,一条用于周末和假期。我有我们所有客户的入住和退房时间,仅此而已。
我想象得到类似于这种类型的图表(不过,请记住这里的人会过夜)。
Popular times - Google
关于如何解决这个问题有什么想法吗?
首先,让我们生成一些 check-in/check-out 数据:
library(tidyverse)
library(lubridate)
nclients <- 40
startinterval <- '2021/11/04'
endinterval <- '2021/11/07'
set.seed(1236)
data <- tibble(client = 1:nclients,
checkin = ymd_hms(sample(seq(as.POSIXct(startinterval),
as.POSIXct(endinterval),
by="sec"),
nclients,
replace = TRUE))
)
data <- data %>%
mutate(inte = floor(sample(x = 1800:28800, size = nrow(data), replace = TRUE)),
checkout = checkin + inte) %>%
select(-inte)
head(data)
# # A tibble: 6 x 3
# client checkin checkout
# <int> <dttm> <dttm>
# 1 1 2021-11-06 23:21:22 2021-11-07 06:14:03
# 2 2 2021-11-04 19:22:20 2021-11-04 22:46:54
# 3 3 2021-11-06 21:44:56 2021-11-07 04:22:11
# 4 4 2021-11-05 04:32:33 2021-11-05 09:32:05
# 5 5 2021-11-05 13:27:55 2021-11-05 15:34:22
# 6 6 2021-11-04 15:31:23 2021-11-04 22:41:26
然后我们需要一个函数将该数据转换为指定客户注册时间(即登记入住和退房之间)的数据。修改Bas的部分答案
whathours <- function(start_time, end_time) {
time_interval <- interval(start_time, end_time)
start_hour <- floor_date(start_time, unit = "hour")
end_hour <- ceiling_date(end_time, unit = "hour")
diff_hours <- as.double(difftime(end_hour, start_hour, "hours"))
hours <- start_hour + hours(0:diff_hours)
hour_intervals <- int_diff(hours)
hours <- hours[1:(length(hours)-1)]
tibble(Day = date(hours),
HourOfDay = hour(hours))
}
映射该函数,然后我们生成一个数据集,该数据集将每小时在旅馆中注册的客户数量分组,并根据要求按日期类型分组:
data2 <- data %>%
mutate(start_time = as_datetime(checkin),
end_time = as_datetime(checkout)) %>%
as_tibble() %>%
mutate(infoperhour = purrr::map2(start_time, end_time, whathours)) %>%
unnest(infoperhour) %>%
group_by(Day, HourOfDay) %>%
summarise(day = wday(Day),
typeofday = ifelse(day %in% c(1:5), "weekday", "weekend")) %>%
group_by(typeofday) %>% count(HourOfDay, sort = TRUE) %>%
ungroup() %>%
arrange(typeofday, HourOfDay)
head(data2)
# # A tibble: 6 x 3
# typeofday HourOfDay n
# <chr> <int> <int>
# 1 weekday 0 3
# 2 weekday 1 3
# 3 weekday 2 5
# 4 weekday 3 5
# 5 weekday 4 5
# 6 weekday 5 4
最后,我们绘制数据:
data2 %>%
ggplot(.,aes(x = HourOfDay, color = typeofday))+
geom_density()+
scale_x_continuous(limits = c(0, 23), breaks = seq(0, 23, by = 1))+
theme_classic()
Final plot