从当前时间戳开始，在 10 秒内找到数据框中的最后一行

Question

所以我有一个时间序列，其采样频率高于一秒（即我每秒可能有 2-5 个样本，以不同的微秒采样）。我想在数据框中有一个计算列，它是距当前索引时间 10 秒 window 的最后一个条目。

10 秒后可能没有条目，因此我尝试使用 10 秒前的最后一个条目。索引是 DateTimeIndex

我已经编写了一个简单的循环来执行此操作，想知道是否有更有效的方法来执行此操作。

for row_index in df.index:
   df['calculate']=(df[df.index<row_index+pd.Timedelta('10s')][-1])

例子

timestamp               value. calculate
2020-01-27 09:30:00.100 6.     42
2020-01-27 09:30:00.803 10.    25
2020-01-27 09:30:06.000 42.    25
2020-01-27 09:30:10.102 25.    25
2020-01-27 09:33:01.801 3.     20 
2020-01-27 09:33:05.100 10.    20
2020-01-27 09:33:11.700 20.    20

Answer 1

将 df 视为没有预期 calculate 列的数据框。您可以创建一个名为 df_minus10 的虚拟数据帧，其中时间戳比 df.

中的时间戳早 10 秒

df_minus10 = df.copy().rename(columns={'value': 'calculate'})
df_minus10['timestamp'] -= pd.Timedelta(seconds=10)

然后您可以使用 pd.merge_asof 为 df 中的每个时间戳选择从虚拟 smaller-or-equal 到它的最大时间戳。

pd.merge_asof(df, df_minus10, on='timestamp', direction='backward')

请用你的数据来确认这是否更有效。

从当前时间戳开始，在 10 秒内找到数据框中的最后一行

Find the last row in the dataframe in 10 sec from current timestamp

python

timedelta

pandas