计算 pandas 数据框中的重叠时间范围,按人员分组
Count overlapping time frames in a pandas dataframe, grouped by person
我正在使用最佳解决方案 来确定开始时间和结束时间与给定行重叠的行数。但是,我需要这些重叠由组而不是整个数据框来确定。
我正在使用的数据具有对话的开始和结束时间以及相关人员的姓名:
id start_time end_time name
1 2021-02-10 10:37:35 2021-02-10 12:16:22 Bob
2 2021-02-10 11:09:39 2021-02-10 13:06:25 Bob
3 2021-02-10 12:10:33 2021-02-10 17:06:26 Bob
4 2021-02-10 15:05:08 2021-02-10 21:07:05 Sally
5 2021-02-10 21:07:26 2021-02-10 21:26:37 Sally
这是之前post的解决方案:
ends = df['start_time'].values < df['end_time'].values[:, None]
starts = df['start_time'].values > df['start_time'].values[:, None]
d['overlap'] = (ends & starts).sum(0)
df
但是这个记录在对话 3 和 4 之间重叠,而我只是在寻找 1 - 3 或 4 - 5 之间的重叠。
我现在得到的:
id start_time end_time name overlap
1 2021-02-10 10:37:35 2021-02-10 12:16:22 Bob 2
2 2021-02-10 11:09:39 2021-02-10 13:06:25 Bob 1
3 2021-02-10 12:10:33 2021-02-10 17:06:26 Bob 1
4 2021-02-10 15:05:08 2021-02-10 21:07:05 Sally 1
5 2021-02-10 21:07:26 2021-02-10 21:26:37 Sally 0
我想得到什么:
id start_time end_time name overlap
1 2021-02-10 10:37:35 2021-02-10 12:16:22 Bob 2
2 2021-02-10 11:09:39 2021-02-10 13:06:25 Bob 1
3 2021-02-10 12:10:33 2021-02-10 17:06:26 Bob 0
4 2021-02-10 15:05:08 2021-02-10 21:07:05 Sally 1
5 2021-02-10 21:07:26 2021-02-10 21:26:37 Sally 0
我想这可能会满足您的需求。
也为匹配名称添加一个额外的 & 条件:
ends = df['start_time'].values < df['end_time'].values[:, None]
starts = df['start_time'].values > df['start_time'].values[:, None]
same_group = (df['name'].values == df['name'].values[:, None])
# sum across axis=1 !!!
df['overlap'] = (ends & starts & same_group).sum(1)
df
我正在使用最佳解决方案
我正在使用的数据具有对话的开始和结束时间以及相关人员的姓名:
id start_time end_time name
1 2021-02-10 10:37:35 2021-02-10 12:16:22 Bob
2 2021-02-10 11:09:39 2021-02-10 13:06:25 Bob
3 2021-02-10 12:10:33 2021-02-10 17:06:26 Bob
4 2021-02-10 15:05:08 2021-02-10 21:07:05 Sally
5 2021-02-10 21:07:26 2021-02-10 21:26:37 Sally
这是之前post的解决方案:
ends = df['start_time'].values < df['end_time'].values[:, None]
starts = df['start_time'].values > df['start_time'].values[:, None]
d['overlap'] = (ends & starts).sum(0)
df
但是这个记录在对话 3 和 4 之间重叠,而我只是在寻找 1 - 3 或 4 - 5 之间的重叠。
我现在得到的:
id start_time end_time name overlap
1 2021-02-10 10:37:35 2021-02-10 12:16:22 Bob 2
2 2021-02-10 11:09:39 2021-02-10 13:06:25 Bob 1
3 2021-02-10 12:10:33 2021-02-10 17:06:26 Bob 1
4 2021-02-10 15:05:08 2021-02-10 21:07:05 Sally 1
5 2021-02-10 21:07:26 2021-02-10 21:26:37 Sally 0
我想得到什么:
id start_time end_time name overlap
1 2021-02-10 10:37:35 2021-02-10 12:16:22 Bob 2
2 2021-02-10 11:09:39 2021-02-10 13:06:25 Bob 1
3 2021-02-10 12:10:33 2021-02-10 17:06:26 Bob 0
4 2021-02-10 15:05:08 2021-02-10 21:07:05 Sally 1
5 2021-02-10 21:07:26 2021-02-10 21:26:37 Sally 0
我想这可能会满足您的需求。
也为匹配名称添加一个额外的 & 条件:
ends = df['start_time'].values < df['end_time'].values[:, None]
starts = df['start_time'].values > df['start_time'].values[:, None]
same_group = (df['name'].values == df['name'].values[:, None])
# sum across axis=1 !!!
df['overlap'] = (ends & starts & same_group).sum(1)
df