在 R 中绘制 "Popular times" 图

Question

我想创建一个图表来显示我工作的旅馆的繁忙时间。理想情况下，我将能够创建两条密度曲线（或条形），一条用于常规工作日，一条用于周末和假期。我有我们所有客户的入住和退房时间，仅此而已。

我想象得到类似于这种类型的图表（不过，请记住这里的人会过夜）。

Popular times - Google

关于如何解决这个问题有什么想法吗？

Answer 1

首先，让我们生成一些 check-in/check-out 数据：

library(tidyverse)
library(lubridate)

nclients <- 40
startinterval <- '2021/11/04'
endinterval <- '2021/11/07'


set.seed(1236)
data <- tibble(client = 1:nclients,
               checkin = ymd_hms(sample(seq(as.POSIXct(startinterval),
                                            as.POSIXct(endinterval),
                                            by="sec"),
                                        nclients,
                                        replace = TRUE))
)


data <- data %>%
  mutate(inte = floor(sample(x = 1800:28800, size = nrow(data), replace = TRUE)),
         checkout = checkin + inte) %>%
  select(-inte)

head(data)

# # A tibble: 6 x 3
# client checkin             checkout
# <int> <dttm>              <dttm>
# 1      1 2021-11-06 23:21:22 2021-11-07 06:14:03
# 2      2 2021-11-04 19:22:20 2021-11-04 22:46:54
# 3      3 2021-11-06 21:44:56 2021-11-07 04:22:11
# 4      4 2021-11-05 04:32:33 2021-11-05 09:32:05
# 5      5 2021-11-05 13:27:55 2021-11-05 15:34:22
# 6      6 2021-11-04 15:31:23 2021-11-04 22:41:26

然后我们需要一个函数将该数据转换为指定客户注册时间（即登记入住和退房之间）的数据。修改Bas的部分答案，得到：

whathours <- function(start_time, end_time) {
  time_interval <- interval(start_time, end_time)
  
  start_hour <- floor_date(start_time, unit = "hour")
  end_hour <- ceiling_date(end_time, unit = "hour")
  diff_hours <- as.double(difftime(end_hour, start_hour, "hours"))
  
  hours <- start_hour + hours(0:diff_hours)
  hour_intervals <- int_diff(hours)
  
  hours <- hours[1:(length(hours)-1)]
  tibble(Day = date(hours),
         HourOfDay = hour(hours))

}

映射该函数，然后我们生成一个数据集，该数据集将每小时在旅馆中注册的客户数量分组，并根据要求按日期类型分组：

data2 <- data %>%  
  mutate(start_time = as_datetime(checkin),
         end_time = as_datetime(checkout)) %>% 
  as_tibble() %>% 
  mutate(infoperhour = purrr::map2(start_time, end_time, whathours)) %>% 
  unnest(infoperhour) %>% 
  group_by(Day, HourOfDay) %>% 
  summarise(day = wday(Day),
            typeofday = ifelse(day %in% c(1:5), "weekday", "weekend")) %>%
  group_by(typeofday) %>% count(HourOfDay, sort = TRUE) %>%
  ungroup() %>%
  arrange(typeofday, HourOfDay)
head(data2)

# # A tibble: 6 x 3
# typeofday HourOfDay     n
# <chr>         <int> <int>
# 1 weekday           0     3
# 2 weekday           1     3
# 3 weekday           2     5
# 4 weekday           3     5
# 5 weekday           4     5
# 6 weekday           5     4

最后，我们绘制数据：

data2 %>%
  ggplot(.,aes(x = HourOfDay, color = typeofday))+
  geom_density()+
  scale_x_continuous(limits = c(0, 23), breaks = seq(0, 23, by = 1))+
  theme_classic()

Final plot

在 R 中绘制 "Popular times" 图

Plotting a "Popular times" graph in R

time

plot

r

density-plot