计算 pandas 中多个场景的持续时间

Compute duration for multiple scenarios in pandas

我有一个包含多个 ID 的数据帧,我想通过某个滑动 window 帧对其进行切片,并计算出现在 window 中的每个 ID 的持续时间。某些时间片只有一个id,而其他时间片有多个id。

对于出现多个 ID 的情况,我可以如下捕获每个 ID 的持续时间。

具有多个 ID 的数据框

id,date,value
1,2012-01-01 00:09:45,1
1,2012-01-01 00:09:50,1
2,2012-01-01 00:09:55,1
2,2012-01-01 00:10:00,1
2,2012-01-01 00:30:10,1
2,2012-01-01 00:30:15,1
3,2012-01-01 00:30:20,1
3,2012-01-01 00:30:25,1
3,2012-01-01 00:30:30,1
1,2012-01-01 00:30:45,1


import pandas as pd
df = pd.read_csv('df.csv')

df['date'] = pd.to_datetime(df['date'])
diff_ids = df['id'] != df['id'].shift(1)
df = df[diff_ids].copy()
df['start'] = df['date']
df['end'] = df['date'].shift(-1)
df['duration'] = df['end'] - df['start']
print(df)

输出

id date                 value  start                 end                  duration
1  2012-01-01 00:09:45  1      2012-01-01 00:09:45   2012-01-01 00:09:55  00:00:10
2  2012-01-01 00:09:55  1      2012-01-01 00:09:55   2012-01-01 00:30:20  00:20:25
3  2012-01-01 00:30:20  1      2012-01-01 00:30:20   2012-01-01 00:30:45  00:00:25
1  2012-01-01 00:30:45  1      2012-01-01 00:30:45   NaT                  NaT

按照上面同样的逻辑,下面只出现一个id的情况如何也可以解决

具有单个 id 的数据框

id,date,value
2,2012-01-01 00:09:45,1
2,2012-01-01 00:09:50,1
2,2012-01-01 00:09:55,1
2,2012-01-01 00:10:00,1
2,2012-01-01 00:30:10,1
2,2012-01-01 00:30:15,1
2,2012-01-01 00:30:20,1
2,2012-01-01 00:30:25,1
2,2012-01-01 00:30:30,1
2,2012-01-01 00:30:45,1

预期输出:

id date                 value  start                 end                  duration
2  2012-01-01 00:09:45  1      2012-01-01 00:09:45   2012-01-01 00:30:45  00:21:10

如果只有一个ID,你可以这样做:

>>> df.sort_values("date").head(1).assign(start=df["date"].min(), end= df["date"].max(), duration=df["date"].max()-df["date"].min())
id date                 value  start                 end                  duration
2  2012-01-01 00:09:45  1      2012-01-01 00:09:45   2012-01-01 00:30:45  0 days 00:21:00