如何使用另一列的滚动平均值创建新列 - Python
How to create a new column with the rolling mean of another column - Python
我有一个数据框:
import pandas as pd
import numpy as np
d1 = {'id': [11, 11,11,11,11,24,24,24,24,24,24],
'PT': [3, 3,6,0,9,4,2,3,4,5,0],
"date":["2010-10-10","2010-10-12","2010-10-16","2010-10-18","2010-10-22","2010-10-10","2010-10-11","2010-10-14","2010-10-16","2010-10-19","2010-10-22"],
}
df1 = pd.DataFrame(data=d1)
id PT date
0 11 3 2010-10-10
1 11 3 2010-10-12
2 11 6 2010-10-16
3 11 0 2010-10-18
4 11 9 2010-10-22
5 24 4 2010-10-10
6 24 2 2010-10-11
7 24 3 2010-10-14
8 24 4 2010-10-16
9 24 5 2010-10-19
10 24 0 2010-10-22
并且我想计算 PT
列的滚动平均值,每个 id
在 id
的最后 3 个条目的移动 window 上。此外,如果 id
还没有 3 个条目,我想获得最后 2 个条目或当前条目的平均值。结果应如下所示:
id PT date Rolling mean last 3
0 11 3 2010-10-10 3
1 11 3 2010-10-12 3
2 11 6 2010-10-16 4
3 11 0 2010-10-18 3
4 11 9 2010-10-22 5
5 24 4 2010-10-10 4
6 24 2 2010-10-11 3
7 24 3 2010-10-14 3
8 24 4 2010-10-16 3
9 24 5 2010-10-19 4
10 24 0 2010-10-22 3
我尝试并获得:
df1["rolling"]=df1.groupby('id')['PT'].rolling(3).mean().reset_index(0,drop=True)
id PT date rolling
0 11 3 2010-10-10 NaN
1 11 3 2010-10-12 NaN
2 11 6 2010-10-16 4.0
3 11 0 2010-10-18 3.0
4 11 9 2010-10-22 5.0
5 24 4 2010-10-10 NaN
6 24 2 2010-10-11 NaN
7 24 3 2010-10-14 3.0
8 24 4 2010-10-16 3.0
9 24 5 2010-10-19 4.0
10 24 0 2010-10-22 3.0
因此,我的问题是当没有 3 个条目时...我有 NaN 而不是 2 个先前或当前条目。
您可能正在寻找 min_periods
参数:
df1['rolling'] = df1.groupby('id')['PT'].rolling(window=3, min_periods=1).mean().reset_index(0, drop=True)
id PT date rolling
0 11 3 2010-10-10 3.0
1 11 3 2010-10-12 3.0
2 11 6 2010-10-16 4.0
3 11 0 2010-10-18 3.0
4 11 9 2010-10-22 5.0
5 24 4 2010-10-10 4.0
6 24 2 2010-10-11 3.0
7 24 3 2010-10-14 3.0
8 24 4 2010-10-16 3.0
9 24 5 2010-10-19 4.0
10 24 0 2010-10-22 3.0
我有一个数据框:
import pandas as pd
import numpy as np
d1 = {'id': [11, 11,11,11,11,24,24,24,24,24,24],
'PT': [3, 3,6,0,9,4,2,3,4,5,0],
"date":["2010-10-10","2010-10-12","2010-10-16","2010-10-18","2010-10-22","2010-10-10","2010-10-11","2010-10-14","2010-10-16","2010-10-19","2010-10-22"],
}
df1 = pd.DataFrame(data=d1)
id PT date
0 11 3 2010-10-10
1 11 3 2010-10-12
2 11 6 2010-10-16
3 11 0 2010-10-18
4 11 9 2010-10-22
5 24 4 2010-10-10
6 24 2 2010-10-11
7 24 3 2010-10-14
8 24 4 2010-10-16
9 24 5 2010-10-19
10 24 0 2010-10-22
并且我想计算 PT
列的滚动平均值,每个 id
在 id
的最后 3 个条目的移动 window 上。此外,如果 id
还没有 3 个条目,我想获得最后 2 个条目或当前条目的平均值。结果应如下所示:
id PT date Rolling mean last 3
0 11 3 2010-10-10 3
1 11 3 2010-10-12 3
2 11 6 2010-10-16 4
3 11 0 2010-10-18 3
4 11 9 2010-10-22 5
5 24 4 2010-10-10 4
6 24 2 2010-10-11 3
7 24 3 2010-10-14 3
8 24 4 2010-10-16 3
9 24 5 2010-10-19 4
10 24 0 2010-10-22 3
我尝试并获得:
df1["rolling"]=df1.groupby('id')['PT'].rolling(3).mean().reset_index(0,drop=True)
id PT date rolling
0 11 3 2010-10-10 NaN
1 11 3 2010-10-12 NaN
2 11 6 2010-10-16 4.0
3 11 0 2010-10-18 3.0
4 11 9 2010-10-22 5.0
5 24 4 2010-10-10 NaN
6 24 2 2010-10-11 NaN
7 24 3 2010-10-14 3.0
8 24 4 2010-10-16 3.0
9 24 5 2010-10-19 4.0
10 24 0 2010-10-22 3.0
因此,我的问题是当没有 3 个条目时...我有 NaN 而不是 2 个先前或当前条目。
您可能正在寻找 min_periods
参数:
df1['rolling'] = df1.groupby('id')['PT'].rolling(window=3, min_periods=1).mean().reset_index(0, drop=True)
id PT date rolling
0 11 3 2010-10-10 3.0
1 11 3 2010-10-12 3.0
2 11 6 2010-10-16 4.0
3 11 0 2010-10-18 3.0
4 11 9 2010-10-22 5.0
5 24 4 2010-10-10 4.0
6 24 2 2010-10-11 3.0
7 24 3 2010-10-14 3.0
8 24 4 2010-10-16 3.0
9 24 5 2010-10-19 4.0
10 24 0 2010-10-22 3.0