R:用 group_by 加入列表并总结
R: Join lists with group_by and summarize
假设我的呼叫中心有多条线路。对于每一行,我都知道调用的时间戳 (t_Calltaken) 以及调用结束的时间 (t_Hangup)。
library(tidyverse)
library(lubridate)
Callcenter <- data.frame(line=c("A","B","C","D","A","B","D","A","C","D"),
t_Calltaken=c("2019-01-01 00:10:50", "2019-01-01 00:12:30","2019-01-01 00:17:00","2019-01-01 00:20:50","2019-01-01 00:35:20","2019-01-01 00:42:50","2019-01-01 00:48:50","2019-01-01 01:03:20","2019-01-01 01:10:50","2019-01-01 01:23:50"),
t_Hangup=c("2019-01-01 00:33:10", "2019-01-01 00:35:10","2019-01-01 01:07:33","2019-01-01 00:38:50","2019-01-01 00:49:27","2019-01-01 01:22:40","2019-01-01 01:10:41","2019-01-01 01:26:10","2019-01-01 01:47:44","2019-01-01 01:51:15"))
我现在想分析一年内同时占用的最大线路数。作为分辨率“分钟”很好。因此,我计算了 t_Calltaken 和 t_Hangup 到年初(例如 2019-01-01 00:00:00)的分钟数差异,以获得类似分钟 ID 的内容。
对于每个呼叫,我都可以使用 seq(t_Calltaken,t_Hangup,by=1)
获取被阻止的分钟 ID。
Callcenter %>%
mutate(start_minute_id=round(as.numeric(difftime(t_Calltaken,"2019-01-01 00:00:00",unit="mins"))),
end_minute_id=round(as.numeric(difftime(t_Hangup,"2019-01-01 00:00:00",unit="mins")))) %>%
rowwise() %>%
mutate(blocked_minutes=list(seq(start_minute_id,end_minute_id,by=1)))
# A tibble: 10 × 6
# Rowwise:
line t_Calltaken t_Hangup start_minute_id end_minute_id blocked_minutes
<chr> <chr> <chr> <dbl> <dbl> <list>
1 A 2019-01-01 00:10:50 2019-01-01 00:33:10 11 33 <dbl [23]>
2 B 2019-01-01 00:12:30 2019-01-01 00:35:10 12 35 <dbl [24]>
3 C 2019-01-01 00:17:00 2019-01-01 01:07:33 17 68 <dbl [52]>
4 D 2019-01-01 00:20:50 2019-01-01 00:38:50 21 39 <dbl [19]>
5 A 2019-01-01 00:35:20 2019-01-01 00:49:27 35 49 <dbl [15]>
6 B 2019-01-01 00:42:50 2019-01-01 01:22:40 43 83 <dbl [41]>
7 D 2019-01-01 00:48:50 2019-01-01 01:10:41 49 71 <dbl [23]>
8 A 2019-01-01 01:03:20 2019-01-01 01:26:10 63 86 <dbl [24]>
9 C 2019-01-01 01:10:50 2019-01-01 01:47:44 71 108 <dbl [38]>
10 D 2019-01-01 01:23:50 2019-01-01 01:51:15 84 111 <dbl [28]>
我现在想按行分组并将所有具有被阻止的 minute-ids 的列表连接在一起。
我该怎么做?
在下一步中,我想分析 blocked-minute-ids 的出现次数,以获得并行阻塞的最大行数。还有其他更有效的方法吗?
编辑:
我希望得到一个输出,例如像这样:
line
1 A
2 B
3 C
4 D
blocked_minutes
1 c(11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86)
2 c(12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83)
3 c(17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108)
4 c(21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111)
不确定这是否是您想要的,但是:
Callcenter %>%
mutate(start_minute_id=round(as.numeric(difftime(t_Calltaken,"2019-01-01 00:00:00",unit="mins"))),
end_minute_id=round(as.numeric(difftime(t_Hangup,"2019-01-01 00:00:00",unit="mins")))) %>%
rowwise() %>%
mutate(blocked_minutes=list(seq(start_minute_id,end_minute_id,by=1))) %>%
unnest_longer(blocked_minutes) %>%
group_by(line) %>%
nest() %>%
unnest_wider(col = data)
# A tibble: 4 x 6
# Groups: line [4]
line t_Calltaken t_Hangup start_minute_id end_minute_id blocked_minutes
<chr> <list> <list> <list> <list> <list>
1 A <chr [62]> <chr [62]> <dbl [62]> <dbl [62]> <dbl [62]>
2 B <chr [65]> <chr [65]> <dbl [65]> <dbl [65]> <dbl [65]>
3 C <chr [90]> <chr [90]> <dbl [90]> <dbl [90]> <dbl [90]>
4 D <chr [70]> <chr [70]> <dbl [70]> <dbl [70]> <dbl [70]>
感谢@Ben G 的回答,我得以找到解决方案,以实现我的长期目标,即一次找到最大的阻塞行数。
使用 unnest_longer()
时,blocked_minutes
中的列表可以取消嵌套,然后我只需修改 DF 就可以得到结果。
Callcenter %>%
mutate(start_minute_id=round(as.numeric(difftime(t_Calltaken,"2019-01-01 00:00:00",unit="mins"))),
end_minute_id=round(as.numeric(difftime(t_Hangup,"2019-01-01 00:00:00",unit="mins")))) %>%
rowwise() %>%
mutate(blocked_minutes=list(seq(start_minute_id,end_minute_id,by=1))) %>%
unnest_longer(blocked_minutes) %>%
mutate(value=1) %>%
pivot_wider(id_cols=blocked_minutes, names_from="line",values_from="value") %>%
mutate(sum_blocked_lines=rowSums(.[,2:ncol(.)],na.rm=TRUE)) %>%
summarize(max_blocked_lines=max(sum_blocked_lines))
结果:
# A tibble: 1 × 1
max_blocked_lines
<dbl>
1 4
假设我的呼叫中心有多条线路。对于每一行,我都知道调用的时间戳 (t_Calltaken) 以及调用结束的时间 (t_Hangup)。
library(tidyverse)
library(lubridate)
Callcenter <- data.frame(line=c("A","B","C","D","A","B","D","A","C","D"),
t_Calltaken=c("2019-01-01 00:10:50", "2019-01-01 00:12:30","2019-01-01 00:17:00","2019-01-01 00:20:50","2019-01-01 00:35:20","2019-01-01 00:42:50","2019-01-01 00:48:50","2019-01-01 01:03:20","2019-01-01 01:10:50","2019-01-01 01:23:50"),
t_Hangup=c("2019-01-01 00:33:10", "2019-01-01 00:35:10","2019-01-01 01:07:33","2019-01-01 00:38:50","2019-01-01 00:49:27","2019-01-01 01:22:40","2019-01-01 01:10:41","2019-01-01 01:26:10","2019-01-01 01:47:44","2019-01-01 01:51:15"))
我现在想分析一年内同时占用的最大线路数。作为分辨率“分钟”很好。因此,我计算了 t_Calltaken 和 t_Hangup 到年初(例如 2019-01-01 00:00:00)的分钟数差异,以获得类似分钟 ID 的内容。
对于每个呼叫,我都可以使用 seq(t_Calltaken,t_Hangup,by=1)
获取被阻止的分钟 ID。
Callcenter %>%
mutate(start_minute_id=round(as.numeric(difftime(t_Calltaken,"2019-01-01 00:00:00",unit="mins"))),
end_minute_id=round(as.numeric(difftime(t_Hangup,"2019-01-01 00:00:00",unit="mins")))) %>%
rowwise() %>%
mutate(blocked_minutes=list(seq(start_minute_id,end_minute_id,by=1)))
# A tibble: 10 × 6
# Rowwise:
line t_Calltaken t_Hangup start_minute_id end_minute_id blocked_minutes
<chr> <chr> <chr> <dbl> <dbl> <list>
1 A 2019-01-01 00:10:50 2019-01-01 00:33:10 11 33 <dbl [23]>
2 B 2019-01-01 00:12:30 2019-01-01 00:35:10 12 35 <dbl [24]>
3 C 2019-01-01 00:17:00 2019-01-01 01:07:33 17 68 <dbl [52]>
4 D 2019-01-01 00:20:50 2019-01-01 00:38:50 21 39 <dbl [19]>
5 A 2019-01-01 00:35:20 2019-01-01 00:49:27 35 49 <dbl [15]>
6 B 2019-01-01 00:42:50 2019-01-01 01:22:40 43 83 <dbl [41]>
7 D 2019-01-01 00:48:50 2019-01-01 01:10:41 49 71 <dbl [23]>
8 A 2019-01-01 01:03:20 2019-01-01 01:26:10 63 86 <dbl [24]>
9 C 2019-01-01 01:10:50 2019-01-01 01:47:44 71 108 <dbl [38]>
10 D 2019-01-01 01:23:50 2019-01-01 01:51:15 84 111 <dbl [28]>
我现在想按行分组并将所有具有被阻止的 minute-ids 的列表连接在一起。
我该怎么做?
在下一步中,我想分析 blocked-minute-ids 的出现次数,以获得并行阻塞的最大行数。还有其他更有效的方法吗?
编辑:
我希望得到一个输出,例如像这样:
line
1 A
2 B
3 C
4 D
blocked_minutes
1 c(11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86)
2 c(12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83)
3 c(17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108)
4 c(21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111)
不确定这是否是您想要的,但是:
Callcenter %>%
mutate(start_minute_id=round(as.numeric(difftime(t_Calltaken,"2019-01-01 00:00:00",unit="mins"))),
end_minute_id=round(as.numeric(difftime(t_Hangup,"2019-01-01 00:00:00",unit="mins")))) %>%
rowwise() %>%
mutate(blocked_minutes=list(seq(start_minute_id,end_minute_id,by=1))) %>%
unnest_longer(blocked_minutes) %>%
group_by(line) %>%
nest() %>%
unnest_wider(col = data)
# A tibble: 4 x 6
# Groups: line [4]
line t_Calltaken t_Hangup start_minute_id end_minute_id blocked_minutes
<chr> <list> <list> <list> <list> <list>
1 A <chr [62]> <chr [62]> <dbl [62]> <dbl [62]> <dbl [62]>
2 B <chr [65]> <chr [65]> <dbl [65]> <dbl [65]> <dbl [65]>
3 C <chr [90]> <chr [90]> <dbl [90]> <dbl [90]> <dbl [90]>
4 D <chr [70]> <chr [70]> <dbl [70]> <dbl [70]> <dbl [70]>
感谢@Ben G 的回答,我得以找到解决方案,以实现我的长期目标,即一次找到最大的阻塞行数。
使用 unnest_longer()
时,blocked_minutes
中的列表可以取消嵌套,然后我只需修改 DF 就可以得到结果。
Callcenter %>%
mutate(start_minute_id=round(as.numeric(difftime(t_Calltaken,"2019-01-01 00:00:00",unit="mins"))),
end_minute_id=round(as.numeric(difftime(t_Hangup,"2019-01-01 00:00:00",unit="mins")))) %>%
rowwise() %>%
mutate(blocked_minutes=list(seq(start_minute_id,end_minute_id,by=1))) %>%
unnest_longer(blocked_minutes) %>%
mutate(value=1) %>%
pivot_wider(id_cols=blocked_minutes, names_from="line",values_from="value") %>%
mutate(sum_blocked_lines=rowSums(.[,2:ncol(.)],na.rm=TRUE)) %>%
summarize(max_blocked_lines=max(sum_blocked_lines))
结果:
# A tibble: 1 × 1
max_blocked_lines
<dbl>
1 4