如何计算序列中的时间戳差异 python

How to calculate timestamp difference in sequence python

我的 DF 是这样的:

0   2021-01-01 01:00:00+ 00:00
1   2021-01-01 01:05:00+ 00:00
2   2021-01-01 01:10:00+ 00:00
3   2021-01-01 01:15:00+ 00:00
4   2021-01-04 06:00:00+ 00:00
5   2021-01-04 06:05:00+ 00:00

此列是时间戳。我想计算每个时期的持续时间(行与行之间的间隔不超过 5 分钟)、开始和结束。例如这里,我想得到的结果是:

我该怎么做?

IIUC,您可以使用自定义组和 agg:

# ensure datetime if string
df['timestamp'] = pd.to_datetime(df['timestamp'], format='%Y-%m-%d %H:%M:%S+ 00:00')

# compute a custom group for consecutive values within 5min
group = df['timestamp'].diff().gt('5min').cumsum()

# aggregate
out = (df
 .groupby(group)['timestamp']
 .agg(**{'start': 'min', 'end': 'max', 'delta': lambda g: g.max()-g.min()})
)

输出:

                        start                 end           delta
timestamp                                                        
0         2021-01-01 01:00:00 2021-01-01 01:15:00 0 days 00:15:00
1         2021-01-04 06:00:00 2021-01-04 06:05:00 0 days 00:05:00