当列缺失值时预处理 Sklearn Imputer

Question

我正在尝试使用 Imputer 来处理缺失值。我想跟踪所有缺失值的列，但因为否则我不知道它们中的哪些（列）已被处理：是否可以 return 也包含所有缺失值的列？

Impute Notes

When axis=0, columns which only contained missing values at fit are discarded upon transform. When axis=1, an exception is raised if there are rows for which it is not possible to fill in the missing values (e.g., because they only contain missing values).

import pandas as pd
import numpy as np
from sklearn.preprocessing import Imputer
data={'b1':[1,2,3,4,5],'b2':[1,2,4,4,0],'b3':[0,0,0,0,0]}
X= pd.DataFrame(data)
Imp = Imputer(missing_values=0)
print (Imp.fit_transform(X))

print(X)
   b1  b2  b3
0   1   1   0
1   2   2   0
2   3   4   0
3   4   4   0
4   5   0   0

runfile
[[ 1.    1.  ]
 [ 2.    2.  ]
 [ 3.    4.  ]
 [ 4.    4.  ]
 [ 5.    2.75]]

Answer 1

Imputer class 中的 statistics_ 属性将 return 每列的填充值，包括删除的列。

statistics_ : array of shape (n_features,)
The imputation fill value for each feature if axis == 0.

Imp.statistics_
array([3.  , 2.75,  nan])

获取具有所有 "missing" 个值的列的列名的示例。

nanmask = np.isnan(Imp.statistics_)

nanmask
array([False, False,  True])

X.columns[nanmask]
Index([u'b3'], dtype='object')

当列缺失值时预处理 Sklearn Imputer

Preprocessing Sklearn Imputer when column missing values

python

preprocessor

scikit-learn