现有 Dataframe 的动态列和 python 中先前记录的赋值
Dynamic columns to existing Dataframe and value assignment from previous record in python
我正在尝试动态地将新列插入到我现有的数据框中,并从当前索引 5、索引 10、索引 15、索引 20----索引 300 的先前记录中分配值。基本上我需要将 180 列(之前记录中的 3 列)及其值添加到数据框的每一行。
现有 DF 输出:
index----Price----Bid------ask
1 70.15 70.10 70.20
2 70.18 70.13 70.16
3 70.09 70.09 70.10
4 70.14 70.11 70.14
5 70.13 70.12 70.13
6 70.16 70.16 70.19
7 70.14 70.12 70.16
:
:
n
新 DF 应如下所示:-
index----Price----Bid-----Ask----5_Price--5_Bid--5_Ask--10_Price--10_Bid--10_Ask.....300_Price--300_Bid--300_Ask
1 70.15 70.10 70.20 70.03 70.04 70.05 70.05 70.06 70.07 70.14 70.14 70.16
2 70.18 70.13 70.16 70.01 70.02 70.03 70.09 70.09 70.10 70.17 70.16 70.17
3 70.09 70.09 70.10 70.05 70.03 70.05 70.06 70.04 70.06 70.20 70.18 70.19
4 70.14 70.11 70.14 70.07 70.09 70.10 70.05 70.06 70.07 70.14 70.14 70.16
5 70.13 70.12 70.13 70.03 70.04 70.05 70.05 70.06 70.07 70.14 70.14 70.16
6 70.16 70.16 70.19 70.18 70.16 70.19 70.13 70.11 70.12 70.17 70.18 70.19
7 70.14 70.12 70.16 70.20 70.21 70.19 70.21 70.19 70.21 70.08 70.07 70.09
:
:
n
代码:有效,但即使是 50000 条记录(几个小时)也需要很长时间
t = 5
p = 305
last = '_Last'
cbid = '_Cbid'
cask = '_Cask'
for i in range(p, df.shape[0]):
n = i - 300
for j in range(5,p,5):
in_last = str(j) + last
in_cbid = str(j) + cbid
in_cask = str(j) + cask
df.loc[i, in_last] = df.loc[n+j, "Nusdinr Last"]
df.loc[i, in_cbid] = df.loc[n+j, "Nusdinr Close Bid"]
df.loc[i, in_cask] = df.loc[n+j, "Nusdinr Close Ask"]...
这里有一个更有效的方法,使用 'shift'。
for shift_days in range(0, 300, 5):
col_names = [f"{col}_{shift_days}" for col in ["Price", "Bid", "ask"]]
df[col_names] = df[["Price", "Bid", "ask"]].shift(shift_days)
我正在尝试动态地将新列插入到我现有的数据框中,并从当前索引 5、索引 10、索引 15、索引 20----索引 300 的先前记录中分配值。基本上我需要将 180 列(之前记录中的 3 列)及其值添加到数据框的每一行。
现有 DF 输出:
index----Price----Bid------ask
1 70.15 70.10 70.20
2 70.18 70.13 70.16
3 70.09 70.09 70.10
4 70.14 70.11 70.14
5 70.13 70.12 70.13
6 70.16 70.16 70.19
7 70.14 70.12 70.16
:
:
n
新 DF 应如下所示:-
index----Price----Bid-----Ask----5_Price--5_Bid--5_Ask--10_Price--10_Bid--10_Ask.....300_Price--300_Bid--300_Ask
1 70.15 70.10 70.20 70.03 70.04 70.05 70.05 70.06 70.07 70.14 70.14 70.16
2 70.18 70.13 70.16 70.01 70.02 70.03 70.09 70.09 70.10 70.17 70.16 70.17
3 70.09 70.09 70.10 70.05 70.03 70.05 70.06 70.04 70.06 70.20 70.18 70.19
4 70.14 70.11 70.14 70.07 70.09 70.10 70.05 70.06 70.07 70.14 70.14 70.16
5 70.13 70.12 70.13 70.03 70.04 70.05 70.05 70.06 70.07 70.14 70.14 70.16
6 70.16 70.16 70.19 70.18 70.16 70.19 70.13 70.11 70.12 70.17 70.18 70.19
7 70.14 70.12 70.16 70.20 70.21 70.19 70.21 70.19 70.21 70.08 70.07 70.09
:
:
n
代码:有效,但即使是 50000 条记录(几个小时)也需要很长时间
t = 5
p = 305
last = '_Last'
cbid = '_Cbid'
cask = '_Cask'
for i in range(p, df.shape[0]):
n = i - 300
for j in range(5,p,5):
in_last = str(j) + last
in_cbid = str(j) + cbid
in_cask = str(j) + cask
df.loc[i, in_last] = df.loc[n+j, "Nusdinr Last"]
df.loc[i, in_cbid] = df.loc[n+j, "Nusdinr Close Bid"]
df.loc[i, in_cask] = df.loc[n+j, "Nusdinr Close Ask"]...
这里有一个更有效的方法,使用 'shift'。
for shift_days in range(0, 300, 5):
col_names = [f"{col}_{shift_days}" for col in ["Price", "Bid", "ask"]]
df[col_names] = df[["Price", "Bid", "ask"]].shift(shift_days)