如何计算序列中的时间戳差异 python
How to calculate timestamp difference in sequence python
我的 DF 是这样的:
0 2021-01-01 01:00:00+ 00:00
1 2021-01-01 01:05:00+ 00:00
2 2021-01-01 01:10:00+ 00:00
3 2021-01-01 01:15:00+ 00:00
4 2021-01-04 06:00:00+ 00:00
5 2021-01-04 06:05:00+ 00:00
此列是时间戳。我想计算每个时期的持续时间(行与行之间的间隔不超过 5 分钟)、开始和结束。例如这里,我想得到的结果是:
- 15 分钟,从 2021-01-01 01:00:00+ 00:00 到 2021-01-01 01:15:00+
00:00
- 5 分钟从 2021-01-04 06:00:00+ 00:00 到 2021-01-04
06:05:00+ 00:00
我该怎么做?
IIUC,您可以使用自定义组和 agg
:
# ensure datetime if string
df['timestamp'] = pd.to_datetime(df['timestamp'], format='%Y-%m-%d %H:%M:%S+ 00:00')
# compute a custom group for consecutive values within 5min
group = df['timestamp'].diff().gt('5min').cumsum()
# aggregate
out = (df
.groupby(group)['timestamp']
.agg(**{'start': 'min', 'end': 'max', 'delta': lambda g: g.max()-g.min()})
)
输出:
start end delta
timestamp
0 2021-01-01 01:00:00 2021-01-01 01:15:00 0 days 00:15:00
1 2021-01-04 06:00:00 2021-01-04 06:05:00 0 days 00:05:00
我的 DF 是这样的:
0 2021-01-01 01:00:00+ 00:00
1 2021-01-01 01:05:00+ 00:00
2 2021-01-01 01:10:00+ 00:00
3 2021-01-01 01:15:00+ 00:00
4 2021-01-04 06:00:00+ 00:00
5 2021-01-04 06:05:00+ 00:00
此列是时间戳。我想计算每个时期的持续时间(行与行之间的间隔不超过 5 分钟)、开始和结束。例如这里,我想得到的结果是:
- 15 分钟,从 2021-01-01 01:00:00+ 00:00 到 2021-01-01 01:15:00+ 00:00
- 5 分钟从 2021-01-04 06:00:00+ 00:00 到 2021-01-04 06:05:00+ 00:00
我该怎么做?
IIUC,您可以使用自定义组和 agg
:
# ensure datetime if string
df['timestamp'] = pd.to_datetime(df['timestamp'], format='%Y-%m-%d %H:%M:%S+ 00:00')
# compute a custom group for consecutive values within 5min
group = df['timestamp'].diff().gt('5min').cumsum()
# aggregate
out = (df
.groupby(group)['timestamp']
.agg(**{'start': 'min', 'end': 'max', 'delta': lambda g: g.max()-g.min()})
)
输出:
start end delta
timestamp
0 2021-01-01 01:00:00 2021-01-01 01:15:00 0 days 00:15:00
1 2021-01-04 06:00:00 2021-01-04 06:05:00 0 days 00:05:00