pandas 数据帧中偏移、查找和应用的组合
Combination of offset, lookup, and apply in a pandas dataframe
我有一个 pandas 数据框 df
有两列:日期和价格。对于每一行,我想将日期偏移 3 天,然后找到该日期的价格并填充列 price_new。请注意,日期不一定是为了不完整。查看所需的输出:
df_new =
date price price_new
2021-01-01 37 N/A
2021-01-05 38 9
2021-01-06 35 42
2021-01-07 9 11
2021-01-08 11 ...
2021-01-11 42
2021-01-12 11
...
数据帧df
:
import pandas as pd
import numpy as np
np.random.seed(50)
start_date = "2021-01-01"
end_date= "2021-01-31"
date_range = pd.bdate_range(start=start_date,end=end_date)
df = pd.DataFrame({'date':date_range, 'price':np.random.randint(5, 50, len(date_range))})
提前致谢!
IIUC,你可以先resample
确定你每天都有,然后再用shift()
:
new_df = df.set_index("date").resample("D").last().reset_index()
new_df["price_new"] = new_df["price"].shift(-3)
new_df = new_df[new_df["date"].isin(df["date"])].reset_index(drop=True)
>>> new_df
date price price_new
0 2021-01-01 37.0 NaN
1 2021-01-05 38.0 11.0
2 2021-01-06 35.0 NaN
3 2021-01-07 9.0 NaN
4 2021-01-08 11.0 42.0
5 2021-01-11 42.0 NaN
6 2021-01-12 11.0 NaN
df
使用:
>>> df
date price
0 2021-01-01 37
1 2021-01-05 38
2 2021-01-06 35
3 2021-01-07 9
4 2021-01-08 11
5 2021-01-11 42
6 2021-01-12 11
我有一个 pandas 数据框 df
有两列:日期和价格。对于每一行,我想将日期偏移 3 天,然后找到该日期的价格并填充列 price_new。请注意,日期不一定是为了不完整。查看所需的输出:
df_new =
date price price_new
2021-01-01 37 N/A
2021-01-05 38 9
2021-01-06 35 42
2021-01-07 9 11
2021-01-08 11 ...
2021-01-11 42
2021-01-12 11
...
数据帧df
:
import pandas as pd
import numpy as np
np.random.seed(50)
start_date = "2021-01-01"
end_date= "2021-01-31"
date_range = pd.bdate_range(start=start_date,end=end_date)
df = pd.DataFrame({'date':date_range, 'price':np.random.randint(5, 50, len(date_range))})
提前致谢!
IIUC,你可以先resample
确定你每天都有,然后再用shift()
:
new_df = df.set_index("date").resample("D").last().reset_index()
new_df["price_new"] = new_df["price"].shift(-3)
new_df = new_df[new_df["date"].isin(df["date"])].reset_index(drop=True)
>>> new_df
date price price_new
0 2021-01-01 37.0 NaN
1 2021-01-05 38.0 11.0
2 2021-01-06 35.0 NaN
3 2021-01-07 9.0 NaN
4 2021-01-08 11.0 42.0
5 2021-01-11 42.0 NaN
6 2021-01-12 11.0 NaN
df
使用:
>>> df
date price
0 2021-01-01 37
1 2021-01-05 38
2 2021-01-06 35
3 2021-01-07 9
4 2021-01-08 11
5 2021-01-11 42
6 2021-01-12 11