Python 根据前 3 行滚动 window 向 DataFrame 添加列
Python add columns to DataFame with rolling window based on 3 previous rows
我有这样一个数据框:
original = pd.DataFrame(np.random.randint(0,100,size=(10, 3)), columns=["P1_day", "P1_week", "P1_month"])
print(original)
P1_day P1_week P1_month
0 50 17 55
1 45 3 10
2 93 79 84
3 99 38 33
4 44 35 35
5 25 43 87
6 38 88 56
7 20 66 6
8 4 23 6
9 39 75 3
我需要从 original
数据框的第 3 行开始生成新的数据框,并根据滚动添加新的 9 列 window 定义为具有相应前缀的前 3 行:[_0,_1, _2 ].因此,它是 original
dataframe 中索引为 [0,1,2] 的行。
例如,接下来的 3 列将来自 original.iloc[0]
,
接下来的 3 列将来自 original.iloc[1]
,
最后 3 列将来自 original.iloc[2]
我试图通过下一个代码解决它:
subset_shifted = original[["P1_day", "P1_week", "P1_month"]].shift(3)
subset_shifted.columns = ["P1_day_0", "P1_week_0", "P1_month_0"]
original_ = pd.concat([original, subset_shifted], axis = 1)
print(original_)
结果,我有 3 个附加列,其值来自前 0 行:
P1_day P1_week P1_month P1_day_0 P1_week_0 P1_month_0
0 50 17 55 NaN NaN NaN
1 45 3 10 NaN NaN NaN
2 93 79 84 NaN NaN NaN
3 99 38 33 50.0 17.0 55.0
4 44 35 35 45.0 3.0 10.0
5 25 43 87 93.0 79.0 84.0
6 38 88 56 99.0 38.0 33.0
7 20 66 6 44.0 35.0 35.0
8 4 23 6 25.0 43.0 87.0
9 39 75 3 38.0 88.0 56.0
在下一次迭代中,我使用相同的方法 shift(2)
并接收了来自 original.iloc[1]
的列。
在最后一次迭代中,我做了 shift(1)
并得到了预期的结果,因为:
result = original_.iloc[3:]
P1_day P1_week P1_month P1_day_0 P1_week_0 P1_month_0 P1_day_1 P1_week_1 P1_month_1 P1_day_2 P1_week_2 P1_month_2
3 99 38 33 50.0 17.0 55.0 45.0 3.0 10.0 93.0 79.0 84.0
4 44 35 35 45.0 3.0 10.0 93.0 79.0 84.0 99.0 38.0 33.0
5 25 43 87 93.0 79.0 84.0 99.0 38.0 33.0 44.0 35.0 35.0
6 38 88 56 99.0 38.0 33.0 44.0 35.0 35.0 25.0 43.0 87.0
7 20 66 6 44.0 35.0 35.0 25.0 43.0 87.0 38.0 88.0 56.0
8 4 23 6 25.0 43.0 87.0 38.0 88.0 56.0 20.0 66.0 6.0
9 39 75 3 38.0 88.0 56.0 20.0 66.0 6.0 4.0 23.0 6.0
问题:
有没有办法用我描述的更好的方法来解决这个任务?谢谢。
除非你想要所有这些额外的数据帧,否则你可以直接将新列添加到原始 df 中:
import pandas as pd
import numpy as np
original = pd.DataFrame(
np.random.randint(0,100,size=(10, 3)),
columns=["P1_day", "P1_week", "P1_month"],
)
original[
["P1_day_0", "P1_week_0", "P1_month_0"]
] = original[
["P1_day", "P1_week", "P1_month"]
].shift(3)
print(original)
输出:
P1_day P1_week P1_month P1_day_0 P1_week_0 P1_month_0
0 2 35 26 NaN NaN NaN
1 99 4 96 NaN NaN NaN
2 4 67 6 NaN NaN NaN
3 76 33 31 2.0 35.0 26.0
4 84 60 98 99.0 4.0 96.0
5 57 1 58 4.0 67.0 6.0
6 35 70 96 76.0 33.0 31.0
7 81 32 39 84.0 60.0 98.0
8 25 4 38 57.0 1.0 58.0
9 83 4 60 35.0 70.0 96.0
编辑:OP 提出后续问题:
yes, for the first row it makes sense. But, my task is to add first 3 rows with index 0-1-2 as new 9 columns for the respected rows started from 3rd index. In your output row with index 1st is not added to the 3rd row as 3 columns. In my code that's why I used shift(2) and shift(1) iteratively.
这是如何迭代完成的:
import pandas as pd
import numpy as np
original = pd.DataFrame(
np.random.randint(0,100,size=(10, 3)),
columns=["P1_day", "P1_week", "P1_month"],
)
for shift, n in ((3,0),(2,1),(1,2)):
original[
[f"P1_day_{n}", f"P1_week_{n}", f"P1_month_{n}"]
] = original[
["P1_day", "P1_week", "P1_month"]
].shift(shift)
pd.set_option('display.max_columns', None)
print(original.iloc[3:])
输出:
P1_day P1_week P1_month P1_day_0 P1_week_0 P1_month_0 P1_day_1 \
3 58 43 74 26.0 56.0 82.0 56.0
4 44 27 40 56.0 87.0 38.0 31.0
5 2 90 4 31.0 32.0 87.0 58.0
6 90 70 6 58.0 43.0 74.0 44.0
7 1 31 57 44.0 27.0 40.0 2.0
8 96 22 69 2.0 90.0 4.0 90.0
9 13 98 47 90.0 70.0 6.0 1.0
P1_week_1 P1_month_1 P1_day_2 P1_week_2 P1_month_2
3 87.0 38.0 31.0 32.0 87.0
4 32.0 87.0 58.0 43.0 74.0
5 43.0 74.0 44.0 27.0 40.0
6 27.0 40.0 2.0 90.0 4.0
7 90.0 4.0 90.0 70.0 6.0
8 70.0 6.0 1.0 31.0 57.0
9 31.0 57.0 96.0 22.0 69.0
编辑 2:不在这里做任何假设,但如果您的最终目标是从所有这些新列中的数据中获得类似于 4 周期移动平均线的东西,那么您可能根本不需要它们。您可以使用 pandas.DataFrame.rolling 代替:
import pandas as pd
import numpy as np
original = pd.DataFrame(
np.random.randint(0,100,size=(10, 3)),
columns=["P1_day", "P1_week", "P1_month"],
)
original[
["P1_day_4PMA", "P1_week_4PMA", "P1_month_4PMA"]
] = original[
["P1_day", "P1_week", "P1_month"]
].rolling(4).mean()
pd.set_option('display.max_columns', None)
print(original.iloc[3:])
输出:
P1_day P1_week P1_month P1_day_4PMA P1_week_4PMA P1_month_4PMA
3 1 13 48 31.25 38.00 55.00
4 10 4 40 22.00 21.00 45.75
5 7 76 0 5.50 23.75 37.00
6 5 69 9 5.75 40.50 24.25
7 63 31 82 21.25 45.00 32.75
8 26 67 22 25.25 60.75 28.25
9 89 41 40 45.75 52.00 38.25
我有这样一个数据框:
original = pd.DataFrame(np.random.randint(0,100,size=(10, 3)), columns=["P1_day", "P1_week", "P1_month"])
print(original)
P1_day P1_week P1_month
0 50 17 55
1 45 3 10
2 93 79 84
3 99 38 33
4 44 35 35
5 25 43 87
6 38 88 56
7 20 66 6
8 4 23 6
9 39 75 3
我需要从 original
数据框的第 3 行开始生成新的数据框,并根据滚动添加新的 9 列 window 定义为具有相应前缀的前 3 行:[_0,_1, _2 ].因此,它是 original
dataframe 中索引为 [0,1,2] 的行。
例如,接下来的 3 列将来自 original.iloc[0]
,
接下来的 3 列将来自 original.iloc[1]
,
最后 3 列将来自 original.iloc[2]
我试图通过下一个代码解决它:
subset_shifted = original[["P1_day", "P1_week", "P1_month"]].shift(3)
subset_shifted.columns = ["P1_day_0", "P1_week_0", "P1_month_0"]
original_ = pd.concat([original, subset_shifted], axis = 1)
print(original_)
结果,我有 3 个附加列,其值来自前 0 行:
P1_day P1_week P1_month P1_day_0 P1_week_0 P1_month_0
0 50 17 55 NaN NaN NaN
1 45 3 10 NaN NaN NaN
2 93 79 84 NaN NaN NaN
3 99 38 33 50.0 17.0 55.0
4 44 35 35 45.0 3.0 10.0
5 25 43 87 93.0 79.0 84.0
6 38 88 56 99.0 38.0 33.0
7 20 66 6 44.0 35.0 35.0
8 4 23 6 25.0 43.0 87.0
9 39 75 3 38.0 88.0 56.0
在下一次迭代中,我使用相同的方法 shift(2)
并接收了来自 original.iloc[1]
的列。
在最后一次迭代中,我做了 shift(1)
并得到了预期的结果,因为:
result = original_.iloc[3:]
P1_day P1_week P1_month P1_day_0 P1_week_0 P1_month_0 P1_day_1 P1_week_1 P1_month_1 P1_day_2 P1_week_2 P1_month_2
3 99 38 33 50.0 17.0 55.0 45.0 3.0 10.0 93.0 79.0 84.0
4 44 35 35 45.0 3.0 10.0 93.0 79.0 84.0 99.0 38.0 33.0
5 25 43 87 93.0 79.0 84.0 99.0 38.0 33.0 44.0 35.0 35.0
6 38 88 56 99.0 38.0 33.0 44.0 35.0 35.0 25.0 43.0 87.0
7 20 66 6 44.0 35.0 35.0 25.0 43.0 87.0 38.0 88.0 56.0
8 4 23 6 25.0 43.0 87.0 38.0 88.0 56.0 20.0 66.0 6.0
9 39 75 3 38.0 88.0 56.0 20.0 66.0 6.0 4.0 23.0 6.0
问题: 有没有办法用我描述的更好的方法来解决这个任务?谢谢。
除非你想要所有这些额外的数据帧,否则你可以直接将新列添加到原始 df 中:
import pandas as pd
import numpy as np
original = pd.DataFrame(
np.random.randint(0,100,size=(10, 3)),
columns=["P1_day", "P1_week", "P1_month"],
)
original[
["P1_day_0", "P1_week_0", "P1_month_0"]
] = original[
["P1_day", "P1_week", "P1_month"]
].shift(3)
print(original)
输出:
P1_day P1_week P1_month P1_day_0 P1_week_0 P1_month_0
0 2 35 26 NaN NaN NaN
1 99 4 96 NaN NaN NaN
2 4 67 6 NaN NaN NaN
3 76 33 31 2.0 35.0 26.0
4 84 60 98 99.0 4.0 96.0
5 57 1 58 4.0 67.0 6.0
6 35 70 96 76.0 33.0 31.0
7 81 32 39 84.0 60.0 98.0
8 25 4 38 57.0 1.0 58.0
9 83 4 60 35.0 70.0 96.0
编辑:OP 提出后续问题:
yes, for the first row it makes sense. But, my task is to add first 3 rows with index 0-1-2 as new 9 columns for the respected rows started from 3rd index. In your output row with index 1st is not added to the 3rd row as 3 columns. In my code that's why I used shift(2) and shift(1) iteratively.
这是如何迭代完成的:
import pandas as pd
import numpy as np
original = pd.DataFrame(
np.random.randint(0,100,size=(10, 3)),
columns=["P1_day", "P1_week", "P1_month"],
)
for shift, n in ((3,0),(2,1),(1,2)):
original[
[f"P1_day_{n}", f"P1_week_{n}", f"P1_month_{n}"]
] = original[
["P1_day", "P1_week", "P1_month"]
].shift(shift)
pd.set_option('display.max_columns', None)
print(original.iloc[3:])
输出:
P1_day P1_week P1_month P1_day_0 P1_week_0 P1_month_0 P1_day_1 \
3 58 43 74 26.0 56.0 82.0 56.0
4 44 27 40 56.0 87.0 38.0 31.0
5 2 90 4 31.0 32.0 87.0 58.0
6 90 70 6 58.0 43.0 74.0 44.0
7 1 31 57 44.0 27.0 40.0 2.0
8 96 22 69 2.0 90.0 4.0 90.0
9 13 98 47 90.0 70.0 6.0 1.0
P1_week_1 P1_month_1 P1_day_2 P1_week_2 P1_month_2
3 87.0 38.0 31.0 32.0 87.0
4 32.0 87.0 58.0 43.0 74.0
5 43.0 74.0 44.0 27.0 40.0
6 27.0 40.0 2.0 90.0 4.0
7 90.0 4.0 90.0 70.0 6.0
8 70.0 6.0 1.0 31.0 57.0
9 31.0 57.0 96.0 22.0 69.0
编辑 2:不在这里做任何假设,但如果您的最终目标是从所有这些新列中的数据中获得类似于 4 周期移动平均线的东西,那么您可能根本不需要它们。您可以使用 pandas.DataFrame.rolling 代替:
import pandas as pd
import numpy as np
original = pd.DataFrame(
np.random.randint(0,100,size=(10, 3)),
columns=["P1_day", "P1_week", "P1_month"],
)
original[
["P1_day_4PMA", "P1_week_4PMA", "P1_month_4PMA"]
] = original[
["P1_day", "P1_week", "P1_month"]
].rolling(4).mean()
pd.set_option('display.max_columns', None)
print(original.iloc[3:])
输出:
P1_day P1_week P1_month P1_day_4PMA P1_week_4PMA P1_month_4PMA
3 1 13 48 31.25 38.00 55.00
4 10 4 40 22.00 21.00 45.75
5 7 76 0 5.50 23.75 37.00
6 5 69 9 5.75 40.50 24.25
7 63 31 82 21.25 45.00 32.75
8 26 67 22 25.25 60.75 28.25
9 89 41 40 45.75 52.00 38.25