如何计算组之间的重叠时间间隔

How to compute overlapping time intervals between groups

我有一个包含组 ID、开始时间和结束时间的数据框。我想计算组之间的重叠时间间隔。这是数据集的示例;

id <- c("a","a","b","c","c")
start_time <-as.POSIXct(c("2016-05-27 09:30:00","2016-05-27 15:30:00",
                          "2016-05-27 14:30:00","2016-05-27 09:40:00","2016-05-27 15:00:00"),tz= "UTC")
end_time <-as.POSIXct(c("2016-05-27 10:30:00","2016-05-27 17:30:00",
                        "2016-05-27 16:30:00","2016-05-27 09:50:00","2016-05-27 16:00:00"),tz= "UTC")

df <- data.frame(id,start_time,end_time)

示例数据框如下所示:

            ID             start_time           end_time
1           a        2016-05-27 09:30:00    2016-05-27 10:30:00
2           a        2016-05-27 15:30:00    2016-05-27 17:30:00
3           b        2016-05-27 14:30:00    2016-05-27 16:30:00
4           c        2016-05-27 09:40:00    2016-05-27 09:50:00
5           c        2016-05-27 15:00:00    2016-05-27 16:00:00

建议的数据框的期望结果是

            ID_1             ID_2        overlap
1           a                 b         0 + 60 mins
2           a                 c        10 + 0 + 0 + 30 mins
3           b                 c         0 + 60 mins

最后一列不必显示所有情况。这只是为了帮助您理解。无论如何,是否可以通过比较所有时间间隔来计算组之间的总重叠时间?

这里是:



library(magrittr)
library(lubridate)
library(tidyr)

df %<>% mutate( interval = interval( start_time, end_time ) )

df %>% full_join( df, by=character(), suffix=c("_1","_2") ) %>%
    mutate( overlap = lubridate::intersect( interval_1, interval_2 ) ) %>%
    filter( id_1 < id_2 ) %>%
    replace_na( list(overlap=0) ) %>%
    group_by( id_1, id_2 ) %>%
    summarise( overlap = paste(paste( as.numeric( overlap ) / 60, collapse=" + " ),"mins"))

各种 lubridate 功能是解决方案的关键,其余只是基础设施

输出:


  id_1  id_2  overlap              
  <chr> <chr> <chr>                
1 a     b     0 + 60 mins          
2 a     c     10 + 0 + 0 + 30 mins
3 b     c     0 + 60 mins