使用 Pandas 滚动时如何忽略 NaN

How to ignore NaN when applying rolling with Pandas

我可以知道如何在 df 上执行 rolling 时忽略 NaN

例如,给定 df,在列 a 上执行滚动,但忽略 Nan。这个要求应该产生一些东西

          a       avg
0    6772.0   7508.00
1    7182.0   8400.50
2    8570.0   9049.60
3   11078.0  10380.40
4   11646.0  11180.00
5   13426.0  12050.00
6       NaN  NaN
7   17514.0  19350.00
8   18408.0  20142.50
9   22128.0  20142.50
10  22520.0  21018.67
11      NaN  NaN 
12  26164.0  27796.67
13  26590.0  21627.25
14  30636.0  23735.00
15   3119.0  25457.00
16  32166.0  25173.75
17  34774.0  23353.00

但是,我不知道应该调整这一行的哪一部分以获得上述预期输出

df['a'].rolling(2 * w + 1, center=True, min_periods=1).mean()

目前,以下代码

import numpy as np
import pandas as pd
arr=[[6772],[7182],[8570],[11078],[11646],[13426],[np.nan],[17514],[18408],
[22128],[22520],[np.nan],[26164],[26590],[30636],[3119],[32166],[34774]]
df=pd.DataFrame(arr,columns=['a'])
w = 2
df['avg'] = df['a'].rolling(2 * w + 1, center=True, min_periods=1).mean()

产生了以下内容,

 a       avg
0    6772.0   7508.00
1    7182.0   8400.50
2    8570.0   9049.60
3   11078.0  10380.40
4   11646.0  11180.00
5   13426.0  13416.00   <<<
6       NaN  15248.50   <<<
7   17514.0  17869.00   <<<
8   18408.0  20142.50
9   22128.0  20142.50
10  22520.0  22305.00   <<<
11      NaN  24350.50   <<<
12  26164.0  26477.50   <<<
13  26590.0  21627.25
14  30636.0  23735.00
15   3119.0  25457.00
16  32166.0  25173.75
17  34774.0  23353.00

<<< 表示值与预期输出不同的地方

更新:

添加fillna

df['avg'] = df['a'].fillna(value=0).rolling(2 * w + 1, center=True, min_periods=1).mean()

没有产生预期的输出

          a       avg
0    6772.0   7508.00
1    7182.0   8400.50
2    8570.0   9049.60
3   11078.0  10380.40
4   11646.0   8944.00
5   13426.0  10732.80
6       NaN  12198.80
7   17514.0  14295.20
8   18408.0  16114.00
9   22128.0  16114.00
10  22520.0  17844.00
11      NaN  19480.40
12  26164.0  21182.00
13  26590.0  17301.80
14  30636.0  23735.00
15   3119.0  25457.00
16  32166.0  25173.75
17  34774.0  23353.00

12050=总和(11078 11646 13426)/3

IIUC,你想重启 满足nan 的滚动。一种方法是使用 pandas.DataFrame.groupby:

m = df.isna().any(1)

df["avg"] = (df["a"].groupby(m.cumsum())
                    .rolling(2 * w + 1, center=True, min_periods=1).mean()
                    .reset_index(level=0, drop=True))
df["avg"] = df["avg"][~m]

输出:

          a           avg
0    6772.0   7508.000000
1    7182.0   8400.500000
2    8570.0   9049.600000
3   11078.0  10380.400000
4   11646.0  11180.000000
5   13426.0  12050.000000
6       NaN           NaN
7   17514.0  19350.000000
8   18408.0  20142.500000
9   22128.0  20142.500000
10  22520.0  21018.666667
11      NaN           NaN
12  26164.0  27796.666667
13  26590.0  21627.250000
14  30636.0  23735.000000
15   3119.0  25457.000000
16  32166.0  25173.750000
17  34774.0  23353.000000