根据另一列替换缺失值
Replace missing values based on another column
我正在尝试根据另一列的过滤替换数据框中的缺失值,"Country"
>>> data.head()
Country Advanced skiers, freeriders Snow parks
0 Greece NaN NaN
1 Switzerland 5.0 5.0
2 USA NaN NaN
3 Norway NaN NaN
4 Norway 3.0 4.0
显然这只是一小部分数据,但我希望用每个特征的平均值替换所有 NaN
值。
我试过按国家/地区对数据进行分组,然后计算每列的平均值。当我打印出结果数组时,它给出了预期值。但是,当我把它放入.fillna()
方法中时,数据似乎没有变化
我试过来自this similar post的@DSM 解决方案,但我不确定如何将它应用到多列。
listOfRatings = ['Advanced skiers, freeriders', 'Snow parks']
print (data.groupby('Country')[listOfRatings].mean().fillna(0))
-> displays the expected results
data[listOfRatings] = data[listOfRatings].fillna(data.groupby('Country')[listOfRatings].mean().fillna(0))
-> appears to do nothing to the dataframe
假设这是完整的数据集,这就是我期望的结果。
Country Advanced skiers, freeriders Snow parks
0 Greece 0.0 0.0
1 Switzerland 5.0 5.0
2 USA 0.0 0.0
3 Norway 3.0 4.0
4 Norway 3.0 4.0
谁能解释我做错了什么,以及如何修复代码?
您可以将 transform
用于 return 新的 DataFrame
,其大小与原始值相同,由聚合值填充:
print (data.groupby('Country')[listOfRatings].transform('mean').fillna(0))
Advanced skiers, freeriders Snow parks
0 0.0 0.0
1 5.0 5.0
2 0.0 0.0
3 3.0 4.0
4 3.0 4.0
#dynamic generate all columns names without Country
listOfRatings = data.columns.difference(['Country'])
df1 = data.groupby('Country')[listOfRatings].transform('mean').fillna(0)
data[listOfRatings] = data[listOfRatings].fillna(df1)
print (data)
print (data)
Country Advanced skiers, freeriders Snow parks
0 Greece 0.0 0.0
1 Switzerland 5.0 5.0
2 USA 0.0 0.0
3 Norway 3.0 4.0
4 Norway 3.0 4.0
我正在尝试根据另一列的过滤替换数据框中的缺失值,"Country"
>>> data.head()
Country Advanced skiers, freeriders Snow parks
0 Greece NaN NaN
1 Switzerland 5.0 5.0
2 USA NaN NaN
3 Norway NaN NaN
4 Norway 3.0 4.0
显然这只是一小部分数据,但我希望用每个特征的平均值替换所有 NaN
值。
我试过按国家/地区对数据进行分组,然后计算每列的平均值。当我打印出结果数组时,它给出了预期值。但是,当我把它放入.fillna()
方法中时,数据似乎没有变化
我试过来自this similar post的@DSM 解决方案,但我不确定如何将它应用到多列。
listOfRatings = ['Advanced skiers, freeriders', 'Snow parks']
print (data.groupby('Country')[listOfRatings].mean().fillna(0))
-> displays the expected results
data[listOfRatings] = data[listOfRatings].fillna(data.groupby('Country')[listOfRatings].mean().fillna(0))
-> appears to do nothing to the dataframe
假设这是完整的数据集,这就是我期望的结果。
Country Advanced skiers, freeriders Snow parks
0 Greece 0.0 0.0
1 Switzerland 5.0 5.0
2 USA 0.0 0.0
3 Norway 3.0 4.0
4 Norway 3.0 4.0
谁能解释我做错了什么,以及如何修复代码?
您可以将 transform
用于 return 新的 DataFrame
,其大小与原始值相同,由聚合值填充:
print (data.groupby('Country')[listOfRatings].transform('mean').fillna(0))
Advanced skiers, freeriders Snow parks
0 0.0 0.0
1 5.0 5.0
2 0.0 0.0
3 3.0 4.0
4 3.0 4.0
#dynamic generate all columns names without Country
listOfRatings = data.columns.difference(['Country'])
df1 = data.groupby('Country')[listOfRatings].transform('mean').fillna(0)
data[listOfRatings] = data[listOfRatings].fillna(df1)
print (data)
print (data)
Country Advanced skiers, freeriders Snow parks
0 Greece 0.0 0.0
1 Switzerland 5.0 5.0
2 USA 0.0 0.0
3 Norway 3.0 4.0
4 Norway 3.0 4.0