如何使用组均值在 df 中填充 nan 值?

How can i fill nan values in a df using group mean?

我可以根据下面的python代码

填写数值的缺失数据

df.fillna(df.select_dtypes(include='number').mean().iloc[0], inplace=True)

但这只会用整体平均值填充 Nan。我有一个包含分类变量的列,我需要根据此列中的类别填充平均值。

编辑:这是我正在处理的 df 的一部分。我想用它们各自的列填充 NAN,这意味着按 TFOPWG 处置标签分组。

          TIC ID  TFOPWG Disposition  TESS Mag  TESS Mag err    RA  
TOI                                                                        
101.01  231663901                 KP   12.4069         0.006  318.737000   
102.01  149603524                 KP    9.7109         0.006   87.139833   
103.01  336732616                 KP   11.5232         0.008  312.457500   
104.01  231670397                 KP    9.8638         0.006  319.949708   
105.01  144065872                 KP    9.4995         0.006  337.457833   

              Dec  PM RA (mas/yr)  PM RA err (mas/yr)  PM Dec (mas/yr)  \
TOI                                                                      
101.01 -55.871864          12.641               0.044          -16.011   
102.01 -63.988328         -15.641               0.037           26.046   
103.01 -24.428694          10.426               0.070           15.620   
104.01 -58.148933          10.552               0.045          -10.658   
105.01 -48.003100          91.976               0.052           -6.861   

        Period (days)  Stellar Distance (pc)  Stellar Distance (pc) err  \
TOI                                                                       
101.01       1.430369                375.310                     4.4110   
102.01       4.411929                175.631                     0.5880   
103.01       3.547854                411.211                     7.7520   
104.01       4.087493                316.678                     2.9655   
105.01       2.184670                137.544                     0.7905   

        Stellar Eff Temp (K)  Stellar Eff Temp (K) err  \
TOI                                                      
101.01                5600.0                       NaN   
102.01                6280.0                       NaN   
103.01                6351.0                       NaN   
104.01                6036.0                       NaN   
105.01                5630.0                       NaN   

        Stellar log(g) (cm/s^2) err  Stellar Radius (R_Sun)  \
TOI                                                           
101.01                          NaN                0.890774   
102.01                          NaN                1.210000   
103.01                          NaN                1.400000   
104.01                          NaN                2.218670   
105.01                          NaN                1.240000   

        Stellar Radius (R_Sun) err  
TOI                                 
101.01                    0.043847  
102.01                    0.050000  
103.01                         NaN  
104.01                    0.102573  
105.01                    0.060000  

您可以使用 groupby().transform() 将组的平均值放在每一行,然后您可以 fillna:

df.fillna(df.groupby('category_column').transform('mean'), inplace=True)