如何根据分组对象填充数据框中缺失的数据?
How to fill missing data in a data frame based on grouped objects?
我有一个包含一些列的数据集,我正在使用这些列进行分组 database.I 在同一数据集中有一些其他数字列,但有一些缺失值。我想用缺失条目所在的组的平均值来填充列的缺失值。
Name of Pandas dataset=data
Col on which groups would be based=['A','B']
Col that needs to be imputed with group based means: ['C']
我想你可以使用 groupby
with transform
:
import pandas as pd
import numpy as np
df = pd.DataFrame([[1,1,3],
[1,1,9],
[1,1,np.nan],
[2,2,8],
[2,1,4],
[2,2,np.nan],
[2,2,5]]
, columns=list('ABC'))
print df
A B C
0 1 1 3.0
1 1 1 9.0
2 1 1 NaN
3 2 2 8.0
4 2 1 4.0
5 2 2 NaN
6 2 2 5.0
df['C'] = df.groupby(['A', 'B'])['C'].transform(lambda x: x.fillna( x.mean() ))
print df
A B C
0 1 1 3.0
1 1 1 9.0
2 1 1 6.0
3 2 2 8.0
4 2 1 4.0
5 2 2 6.5
6 2 2 5.0
[df[i].fillna(df[i].mean(),inplace=True) for i in df.columns ]
然后用 5.8 填充 C 列的 NAN,这是列 'C'
的平均值
Output
print df
A B C
0 1 1 3.0
1 1 1 9.0
2 1 1 5.8
3 2 2 8.0
4 2 1 4.0
5 2 2 5.8
6 2 2 5.0
我有一个包含一些列的数据集,我正在使用这些列进行分组 database.I 在同一数据集中有一些其他数字列,但有一些缺失值。我想用缺失条目所在的组的平均值来填充列的缺失值。
Name of Pandas dataset=data
Col on which groups would be based=['A','B']
Col that needs to be imputed with group based means: ['C']
我想你可以使用 groupby
with transform
:
import pandas as pd
import numpy as np
df = pd.DataFrame([[1,1,3],
[1,1,9],
[1,1,np.nan],
[2,2,8],
[2,1,4],
[2,2,np.nan],
[2,2,5]]
, columns=list('ABC'))
print df
A B C
0 1 1 3.0
1 1 1 9.0
2 1 1 NaN
3 2 2 8.0
4 2 1 4.0
5 2 2 NaN
6 2 2 5.0
df['C'] = df.groupby(['A', 'B'])['C'].transform(lambda x: x.fillna( x.mean() ))
print df
A B C
0 1 1 3.0
1 1 1 9.0
2 1 1 6.0
3 2 2 8.0
4 2 1 4.0
5 2 2 6.5
6 2 2 5.0
[df[i].fillna(df[i].mean(),inplace=True) for i in df.columns ]
然后用 5.8 填充 C 列的 NAN,这是列 'C'
的平均值Output
print df
A B C
0 1 1 3.0
1 1 1 9.0
2 1 1 5.8
3 2 2 8.0
4 2 1 4.0
5 2 2 5.8
6 2 2 5.0