R:用 group_by 加入列表并总结

R: Join lists with group_by and summarize

假设我的呼叫中心有多条线路。对于每一行,我都知道调用的时间戳 (t_Calltaken) 以及调用结束的时间 (t_Hangup)。

library(tidyverse)
library(lubridate)

Callcenter <- data.frame(line=c("A","B","C","D","A","B","D","A","C","D"),
           t_Calltaken=c("2019-01-01 00:10:50", "2019-01-01 00:12:30","2019-01-01 00:17:00","2019-01-01 00:20:50","2019-01-01 00:35:20","2019-01-01 00:42:50","2019-01-01 00:48:50","2019-01-01 01:03:20","2019-01-01 01:10:50","2019-01-01 01:23:50"),
           t_Hangup=c("2019-01-01 00:33:10", "2019-01-01 00:35:10","2019-01-01 01:07:33","2019-01-01 00:38:50","2019-01-01 00:49:27","2019-01-01 01:22:40","2019-01-01 01:10:41","2019-01-01 01:26:10","2019-01-01 01:47:44","2019-01-01 01:51:15"))

我现在想分析一年内同时占用的最大线路数。作为分辨率“分钟”很好。因此,我计算了 t_Calltaken 和 t_Hangup 到年初(例如 2019-01-01 00:00:00)的分钟数差异,以获得类似分钟 ID 的内容。

对于每个呼叫,我都可以使用 seq(t_Calltaken,t_Hangup,by=1) 获取被阻止的分钟 ID。

Callcenter %>% 
  mutate(start_minute_id=round(as.numeric(difftime(t_Calltaken,"2019-01-01 00:00:00",unit="mins"))),
         end_minute_id=round(as.numeric(difftime(t_Hangup,"2019-01-01 00:00:00",unit="mins")))) %>% 
  rowwise() %>% 
  mutate(blocked_minutes=list(seq(start_minute_id,end_minute_id,by=1)))

# A tibble: 10 × 6
# Rowwise: 
   line  t_Calltaken         t_Hangup            start_minute_id end_minute_id blocked_minutes
   <chr> <chr>               <chr>                         <dbl>         <dbl> <list>         
 1 A     2019-01-01 00:10:50 2019-01-01 00:33:10              11            33 <dbl [23]>     
 2 B     2019-01-01 00:12:30 2019-01-01 00:35:10              12            35 <dbl [24]>     
 3 C     2019-01-01 00:17:00 2019-01-01 01:07:33              17            68 <dbl [52]>     
 4 D     2019-01-01 00:20:50 2019-01-01 00:38:50              21            39 <dbl [19]>     
 5 A     2019-01-01 00:35:20 2019-01-01 00:49:27              35            49 <dbl [15]>     
 6 B     2019-01-01 00:42:50 2019-01-01 01:22:40              43            83 <dbl [41]>     
 7 D     2019-01-01 00:48:50 2019-01-01 01:10:41              49            71 <dbl [23]>     
 8 A     2019-01-01 01:03:20 2019-01-01 01:26:10              63            86 <dbl [24]>     
 9 C     2019-01-01 01:10:50 2019-01-01 01:47:44              71           108 <dbl [38]>     
10 D     2019-01-01 01:23:50 2019-01-01 01:51:15              84           111 <dbl [28]>  

我现在想按行分组并将所有具有被阻止的 minute-ids 的列表连接在一起。

我该怎么做?

在下一步中,我想分析 blocked-minute-ids 的出现次数,以获得并行阻塞的最大行数。还有其他更有效的方法吗?

编辑:

我希望得到一个输出,例如像这样:

  line
1    A
2    B
3    C
4    D
                                                                                                                                                                                                                                                                                                                                                                     blocked_minutes
1                                                                                                                          c(11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86)
2                                                                                                              c(12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83)
3 c(17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108)
4      c(21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111)

不确定这是否是您想要的,但是:

Callcenter %>% 
  mutate(start_minute_id=round(as.numeric(difftime(t_Calltaken,"2019-01-01 00:00:00",unit="mins"))),
         end_minute_id=round(as.numeric(difftime(t_Hangup,"2019-01-01 00:00:00",unit="mins")))) %>% 
  rowwise() %>% 
  mutate(blocked_minutes=list(seq(start_minute_id,end_minute_id,by=1))) %>% 

  unnest_longer(blocked_minutes) %>% 
  group_by(line) %>% 
  nest() %>% 
  unnest_wider(col = data)

# A tibble: 4 x 6
# Groups:   line [4]
  line  t_Calltaken t_Hangup   start_minute_id end_minute_id blocked_minutes
  <chr> <list>      <list>     <list>          <list>        <list>         
1 A     <chr [62]>  <chr [62]> <dbl [62]>      <dbl [62]>    <dbl [62]>     
2 B     <chr [65]>  <chr [65]> <dbl [65]>      <dbl [65]>    <dbl [65]>     
3 C     <chr [90]>  <chr [90]> <dbl [90]>      <dbl [90]>    <dbl [90]>     
4 D     <chr [70]>  <chr [70]> <dbl [70]>      <dbl [70]>    <dbl [70]>  

感谢@Ben G 的回答,我得以找到解决方案,以实现我的长期目标,即一次找到最大的阻塞行数。

使用 unnest_longer() 时,blocked_minutes 中的列表可以取消嵌套,然后我只需修改 DF 就可以得到结果。

Callcenter %>% 
  mutate(start_minute_id=round(as.numeric(difftime(t_Calltaken,"2019-01-01 00:00:00",unit="mins"))),
         end_minute_id=round(as.numeric(difftime(t_Hangup,"2019-01-01 00:00:00",unit="mins")))) %>% 
  rowwise() %>% 
  mutate(blocked_minutes=list(seq(start_minute_id,end_minute_id,by=1))) %>% 
  unnest_longer(blocked_minutes) %>% 
  mutate(value=1) %>% 
  pivot_wider(id_cols=blocked_minutes, names_from="line",values_from="value") %>% 
  mutate(sum_blocked_lines=rowSums(.[,2:ncol(.)],na.rm=TRUE)) %>% 
  summarize(max_blocked_lines=max(sum_blocked_lines))

结果:

# A tibble: 1 × 1
  max_blocked_lines
              <dbl>
1                 4