Pandas 滚动表示不要将 DataFrame 中的数字更改为 NaN

Pandas rolling mean don't change numbers to NaN in DataFrame

我正在使用 pandas DataFrame,它看起来像这样:

(**N.B - 偏移量设置为DataFrame的索引)

offset         X         Y         Z
  0   -0.140137   -1.924316   -0.426758
 10   -2.789123   -1.111212   -0.416016
 20   -0.133789   -1.923828   -4.408691
 30   -0.101112   -1.457891   -0.425781
 40   -0.126465   -1.926758   -0.414062
 50   -0.137207   -1.916992   -0.404297
 60   -0.130371   -3.784591   -0.987654
 70   -0.125000   -1.918457   -0.403809
 80   -0.123456   -1.917480   -0.413574
 90   -0.126465   -1.926758   -0.333554

我使用以下代码将 window 大小 = 5 的滚动平均值应用于数据框。 我需要保持这个 window size = 5 并且我需要所有偏移值(没有 NaN)的整个数据帧的值。

df = df.rolling(center=False, window=5).mean()

这给了我:

offset         X         Y         Z
 0.0       NaN       NaN       NaN
10.0       NaN       NaN       NaN
20.0       NaN       NaN       NaN
30.0       NaN       NaN       NaN
40.0 -0.658125 -1.668801 -1.218262
50.0 -0.657539 -1.667336 -1.213769
60.0 -0.125789 -2.202012 -1.328097
70.0 -0.124031 -2.200938 -0.527121
80.0 -0.128500 -2.292856 -0.524679
90.0 -0.128500 -2.292856 -0.508578

我希望 DataFrame 能够保持 NaN 的第一个值不变,并将其余值作为滚动平均值的结果。有没有一种简单的方法可以做到这一点?谢谢

offset         X         Y         Z
 0.0  -0.140137  -1.924316  -0.426758
10.0  -2.789123  -1.111212  -0.416016
20.0  -0.133789  -1.923828  -4.408691
30.0  -0.101112  -1.457891  -0.425781
40.0  -0.658125  -1.668801  -1.218262
50.0  -0.657539  -1.667336  -1.213769
60.0  -0.125789  -2.202012  -1.328097
70.0  -0.124031  -2.200938  -0.527121
80.0  -0.128500  -2.292856  -0.524679
90.0  -0.128500  -2.292856  -0.508578

可以用原来的df填:

df.rolling(center=False, window=5).mean().fillna(df)
Out: 
               X         Y         Z
offset                              
0      -0.140137 -1.924316 -0.426758
10     -2.789123 -1.111212 -0.416016
20     -0.133789 -1.923828 -4.408691
30     -0.101112 -1.457891 -0.425781
40     -0.658125 -1.668801 -1.218262
50     -0.657539 -1.667336 -1.213769
60     -0.125789 -2.202012 -1.328097
70     -0.124031 -2.200938 -0.527121
80     -0.128500 -2.292856 -0.524679
90     -0.128500 -2.292856 -0.508578

还有一个参数,min_periods,你可以使用。如果您传递 min_periods=1 那么它将采用第一个值,第二个值作为前两个的平均值等。在某些情况下它可能更有意义。

df.rolling(center=False, window=5, min_periods=1).mean()
Out: 
               X         Y         Z
offset                              
0      -0.140137 -1.924316 -0.426758
10     -1.464630 -1.517764 -0.421387
20     -1.021016 -1.653119 -1.750488
30     -0.791040 -1.604312 -1.419311
40     -0.658125 -1.668801 -1.218262
50     -0.657539 -1.667336 -1.213769
60     -0.125789 -2.202012 -1.328097
70     -0.124031 -2.200938 -0.527121
80     -0.128500 -2.292856 -0.524679
90     -0.128500 -2.292856 -0.508578

假设您没有其他行全部为 NaN,您可以确定 rolling_df 中哪些行全部为 NaN,并将它们替换为原始行中的相应行。示例:

df=pd.DataFrame(np.random.rand(13,5))
df_rolling=df.rolling(center=False,window=5).mean()
#identify which rows are all NaN
idx = df_rolling.index[df_rolling.isnull().all(1)]
#replace those rows with the original data
df_rolling.loc[idx,:]=df.loc[idx,:]