使用 Pandas 滚动时如何忽略 NaN
How to ignore NaN when applying rolling with Pandas
我可以知道如何在 df
上执行 rolling
时忽略 NaN
。
例如,给定 df
,在列 a
上执行滚动,但忽略 Nan
。这个要求应该产生一些东西
a avg
0 6772.0 7508.00
1 7182.0 8400.50
2 8570.0 9049.60
3 11078.0 10380.40
4 11646.0 11180.00
5 13426.0 12050.00
6 NaN NaN
7 17514.0 19350.00
8 18408.0 20142.50
9 22128.0 20142.50
10 22520.0 21018.67
11 NaN NaN
12 26164.0 27796.67
13 26590.0 21627.25
14 30636.0 23735.00
15 3119.0 25457.00
16 32166.0 25173.75
17 34774.0 23353.00
但是,我不知道应该调整这一行的哪一部分以获得上述预期输出
df['a'].rolling(2 * w + 1, center=True, min_periods=1).mean()
目前,以下代码
import numpy as np
import pandas as pd
arr=[[6772],[7182],[8570],[11078],[11646],[13426],[np.nan],[17514],[18408],
[22128],[22520],[np.nan],[26164],[26590],[30636],[3119],[32166],[34774]]
df=pd.DataFrame(arr,columns=['a'])
w = 2
df['avg'] = df['a'].rolling(2 * w + 1, center=True, min_periods=1).mean()
产生了以下内容,
a avg
0 6772.0 7508.00
1 7182.0 8400.50
2 8570.0 9049.60
3 11078.0 10380.40
4 11646.0 11180.00
5 13426.0 13416.00 <<<
6 NaN 15248.50 <<<
7 17514.0 17869.00 <<<
8 18408.0 20142.50
9 22128.0 20142.50
10 22520.0 22305.00 <<<
11 NaN 24350.50 <<<
12 26164.0 26477.50 <<<
13 26590.0 21627.25
14 30636.0 23735.00
15 3119.0 25457.00
16 32166.0 25173.75
17 34774.0 23353.00
<<<
表示值与预期输出不同的地方
更新:
添加fillna
df['avg'] = df['a'].fillna(value=0).rolling(2 * w + 1, center=True, min_periods=1).mean()
没有产生预期的输出
a avg
0 6772.0 7508.00
1 7182.0 8400.50
2 8570.0 9049.60
3 11078.0 10380.40
4 11646.0 8944.00
5 13426.0 10732.80
6 NaN 12198.80
7 17514.0 14295.20
8 18408.0 16114.00
9 22128.0 16114.00
10 22520.0 17844.00
11 NaN 19480.40
12 26164.0 21182.00
13 26590.0 17301.80
14 30636.0 23735.00
15 3119.0 25457.00
16 32166.0 25173.75
17 34774.0 23353.00
12050=总和(11078 11646 13426)/3
IIUC,你想重启 满足nan
的滚动。一种方法是使用 pandas.DataFrame.groupby
:
m = df.isna().any(1)
df["avg"] = (df["a"].groupby(m.cumsum())
.rolling(2 * w + 1, center=True, min_periods=1).mean()
.reset_index(level=0, drop=True))
df["avg"] = df["avg"][~m]
输出:
a avg
0 6772.0 7508.000000
1 7182.0 8400.500000
2 8570.0 9049.600000
3 11078.0 10380.400000
4 11646.0 11180.000000
5 13426.0 12050.000000
6 NaN NaN
7 17514.0 19350.000000
8 18408.0 20142.500000
9 22128.0 20142.500000
10 22520.0 21018.666667
11 NaN NaN
12 26164.0 27796.666667
13 26590.0 21627.250000
14 30636.0 23735.000000
15 3119.0 25457.000000
16 32166.0 25173.750000
17 34774.0 23353.000000
我可以知道如何在 df
上执行 rolling
时忽略 NaN
。
例如,给定 df
,在列 a
上执行滚动,但忽略 Nan
。这个要求应该产生一些东西
a avg
0 6772.0 7508.00
1 7182.0 8400.50
2 8570.0 9049.60
3 11078.0 10380.40
4 11646.0 11180.00
5 13426.0 12050.00
6 NaN NaN
7 17514.0 19350.00
8 18408.0 20142.50
9 22128.0 20142.50
10 22520.0 21018.67
11 NaN NaN
12 26164.0 27796.67
13 26590.0 21627.25
14 30636.0 23735.00
15 3119.0 25457.00
16 32166.0 25173.75
17 34774.0 23353.00
但是,我不知道应该调整这一行的哪一部分以获得上述预期输出
df['a'].rolling(2 * w + 1, center=True, min_periods=1).mean()
目前,以下代码
import numpy as np
import pandas as pd
arr=[[6772],[7182],[8570],[11078],[11646],[13426],[np.nan],[17514],[18408],
[22128],[22520],[np.nan],[26164],[26590],[30636],[3119],[32166],[34774]]
df=pd.DataFrame(arr,columns=['a'])
w = 2
df['avg'] = df['a'].rolling(2 * w + 1, center=True, min_periods=1).mean()
产生了以下内容,
a avg
0 6772.0 7508.00
1 7182.0 8400.50
2 8570.0 9049.60
3 11078.0 10380.40
4 11646.0 11180.00
5 13426.0 13416.00 <<<
6 NaN 15248.50 <<<
7 17514.0 17869.00 <<<
8 18408.0 20142.50
9 22128.0 20142.50
10 22520.0 22305.00 <<<
11 NaN 24350.50 <<<
12 26164.0 26477.50 <<<
13 26590.0 21627.25
14 30636.0 23735.00
15 3119.0 25457.00
16 32166.0 25173.75
17 34774.0 23353.00
<<<
表示值与预期输出不同的地方
更新:
添加fillna
df['avg'] = df['a'].fillna(value=0).rolling(2 * w + 1, center=True, min_periods=1).mean()
没有产生预期的输出
a avg
0 6772.0 7508.00
1 7182.0 8400.50
2 8570.0 9049.60
3 11078.0 10380.40
4 11646.0 8944.00
5 13426.0 10732.80
6 NaN 12198.80
7 17514.0 14295.20
8 18408.0 16114.00
9 22128.0 16114.00
10 22520.0 17844.00
11 NaN 19480.40
12 26164.0 21182.00
13 26590.0 17301.80
14 30636.0 23735.00
15 3119.0 25457.00
16 32166.0 25173.75
17 34774.0 23353.00
12050=总和(11078 11646 13426)/3
IIUC,你想重启 满足nan
的滚动。一种方法是使用 pandas.DataFrame.groupby
:
m = df.isna().any(1)
df["avg"] = (df["a"].groupby(m.cumsum())
.rolling(2 * w + 1, center=True, min_periods=1).mean()
.reset_index(level=0, drop=True))
df["avg"] = df["avg"][~m]
输出:
a avg
0 6772.0 7508.000000
1 7182.0 8400.500000
2 8570.0 9049.600000
3 11078.0 10380.400000
4 11646.0 11180.000000
5 13426.0 12050.000000
6 NaN NaN
7 17514.0 19350.000000
8 18408.0 20142.500000
9 22128.0 20142.500000
10 22520.0 21018.666667
11 NaN NaN
12 26164.0 27796.666667
13 26590.0 21627.250000
14 30636.0 23735.000000
15 3119.0 25457.000000
16 32166.0 25173.750000
17 34774.0 23353.000000