获取日期时间索引数据框中 x 秒后立即可用的元素

Question

我有以下数据框：

Time                     response_a  response_b 
2022-01-22 16:00:00.222    101.01     0.5          
2022-01-22 16:00:00.347    101.7      0.6          
...
2022-01-22 16:00:01.100    102        0.7          
2022-01-22 16:00:01.255    103        0.8

我想得到以下信息：

Time                     response_a  response_b  response_a_lagged     response_b_lagged
2022-01-22 16:00:00.222    101.01     0.5          103                    0.8
2022-01-22 16:00:00.347    101.7      0.6          etc                    etc
...
2022-01-22 16:00:01.100    102        0.7          etc                    etc
2022-01-22 16:00:01.255    103        0.8          etc                    etc

时间是一个 DateTimeIndex。我想得到两个 response_a 滞后 x 秒（例如 1 秒）的新列，response_b 也是如此。在上面的例子中，如果恰好 1 秒后没有值，它应该直接使用下一个。

我试过df.shift(periods=1,freq='s')和df.shift(periods=1000,freq='ms')
但我收到以下错误：“无法从重复轴重新索引”

我做了 SELECT DISTINCT 从数据库中获取数据，所以我认为我不应该有任何重复的索引？

谢谢！

Answer 1

col = ['Time','response_a','response_b' ]

data = [
['2022-01-22 16:00:00.222',101.1,0.5],
['2022-01-22 16:00:00.347',101.7,0.6],
['2022-01-22 16:00:01.100',102,0.7],
['2022-01-22 16:00:01.255',103,0.8],
]

df = pd.DataFrame(data, columns=col)
df['Time'] = pd.to_datetime(df['Time'])

df_temp = df.copy()
df_temp['Time'] -= pd.Timedelta(seconds=1)

pd.merge_asof(df, df_temp, on='Time', suffixes = ('', '_lagged'), direction='forward')

您的条件是至少在 1 秒后匹配第一条记录，对于这种匹配不准确的工作，您希望pd.merge_asof。

然而，pd.merge_asof 不会像 1 那样接受参数，所以这里的一个小技巧是创建一个虚拟 df_temp。今天你只需要至少 1 秒，但将来如果你还需要最大值，请阅读 pd.merge_asof

的 tolerance 参数

参考：https://pandas.pydata.org/docs/reference/api/pandas.merge_asof.html

获取日期时间索引数据框中 x 秒后立即可用的元素

Get the element immediately available after x seconds in a datetime index dataframe

python

indexing

time

dataframe