如何使用组均值在 df 中填充 nan 值?
How can i fill nan values in a df using group mean?
我可以根据下面的python代码
填写数值的缺失数据
df.fillna(df.select_dtypes(include='number').mean().iloc[0], inplace=True)
但这只会用整体平均值填充 Nan。我有一个包含分类变量的列,我需要根据此列中的类别填充平均值。
编辑:这是我正在处理的 df 的一部分。我想用它们各自的列填充 NAN,这意味着按 TFOPWG 处置标签分组。
TIC ID TFOPWG Disposition TESS Mag TESS Mag err RA
TOI
101.01 231663901 KP 12.4069 0.006 318.737000
102.01 149603524 KP 9.7109 0.006 87.139833
103.01 336732616 KP 11.5232 0.008 312.457500
104.01 231670397 KP 9.8638 0.006 319.949708
105.01 144065872 KP 9.4995 0.006 337.457833
Dec PM RA (mas/yr) PM RA err (mas/yr) PM Dec (mas/yr) \
TOI
101.01 -55.871864 12.641 0.044 -16.011
102.01 -63.988328 -15.641 0.037 26.046
103.01 -24.428694 10.426 0.070 15.620
104.01 -58.148933 10.552 0.045 -10.658
105.01 -48.003100 91.976 0.052 -6.861
Period (days) Stellar Distance (pc) Stellar Distance (pc) err \
TOI
101.01 1.430369 375.310 4.4110
102.01 4.411929 175.631 0.5880
103.01 3.547854 411.211 7.7520
104.01 4.087493 316.678 2.9655
105.01 2.184670 137.544 0.7905
Stellar Eff Temp (K) Stellar Eff Temp (K) err \
TOI
101.01 5600.0 NaN
102.01 6280.0 NaN
103.01 6351.0 NaN
104.01 6036.0 NaN
105.01 5630.0 NaN
Stellar log(g) (cm/s^2) err Stellar Radius (R_Sun) \
TOI
101.01 NaN 0.890774
102.01 NaN 1.210000
103.01 NaN 1.400000
104.01 NaN 2.218670
105.01 NaN 1.240000
Stellar Radius (R_Sun) err
TOI
101.01 0.043847
102.01 0.050000
103.01 NaN
104.01 0.102573
105.01 0.060000
您可以使用 groupby().transform()
将组的平均值放在每一行,然后您可以 fillna
:
df.fillna(df.groupby('category_column').transform('mean'), inplace=True)
我可以根据下面的python代码
填写数值的缺失数据df.fillna(df.select_dtypes(include='number').mean().iloc[0], inplace=True)
但这只会用整体平均值填充 Nan。我有一个包含分类变量的列,我需要根据此列中的类别填充平均值。
编辑:这是我正在处理的 df 的一部分。我想用它们各自的列填充 NAN,这意味着按 TFOPWG 处置标签分组。
TIC ID TFOPWG Disposition TESS Mag TESS Mag err RA
TOI
101.01 231663901 KP 12.4069 0.006 318.737000
102.01 149603524 KP 9.7109 0.006 87.139833
103.01 336732616 KP 11.5232 0.008 312.457500
104.01 231670397 KP 9.8638 0.006 319.949708
105.01 144065872 KP 9.4995 0.006 337.457833
Dec PM RA (mas/yr) PM RA err (mas/yr) PM Dec (mas/yr) \
TOI
101.01 -55.871864 12.641 0.044 -16.011
102.01 -63.988328 -15.641 0.037 26.046
103.01 -24.428694 10.426 0.070 15.620
104.01 -58.148933 10.552 0.045 -10.658
105.01 -48.003100 91.976 0.052 -6.861
Period (days) Stellar Distance (pc) Stellar Distance (pc) err \
TOI
101.01 1.430369 375.310 4.4110
102.01 4.411929 175.631 0.5880
103.01 3.547854 411.211 7.7520
104.01 4.087493 316.678 2.9655
105.01 2.184670 137.544 0.7905
Stellar Eff Temp (K) Stellar Eff Temp (K) err \
TOI
101.01 5600.0 NaN
102.01 6280.0 NaN
103.01 6351.0 NaN
104.01 6036.0 NaN
105.01 5630.0 NaN
Stellar log(g) (cm/s^2) err Stellar Radius (R_Sun) \
TOI
101.01 NaN 0.890774
102.01 NaN 1.210000
103.01 NaN 1.400000
104.01 NaN 2.218670
105.01 NaN 1.240000
Stellar Radius (R_Sun) err
TOI
101.01 0.043847
102.01 0.050000
103.01 NaN
104.01 0.102573
105.01 0.060000
您可以使用 groupby().transform()
将组的平均值放在每一行,然后您可以 fillna
:
df.fillna(df.groupby('category_column').transform('mean'), inplace=True)