当列缺失值时预处理 Sklearn Imputer
Preprocessing Sklearn Imputer when column missing values
我正在尝试使用 Imputer 来处理缺失值。
我想跟踪所有缺失值的列,但因为否则我不知道它们中的哪些(列)已被处理:
是否可以 return 也包含所有缺失值的列?
Impute Notes
When axis=0, columns which only contained missing values at fit are
discarded upon transform. When axis=1, an exception is raised if there
are rows for which it is not possible to fill in the missing values
(e.g., because they only contain missing values).
import pandas as pd
import numpy as np
from sklearn.preprocessing import Imputer
data={'b1':[1,2,3,4,5],'b2':[1,2,4,4,0],'b3':[0,0,0,0,0]}
X= pd.DataFrame(data)
Imp = Imputer(missing_values=0)
print (Imp.fit_transform(X))
print(X)
b1 b2 b3
0 1 1 0
1 2 2 0
2 3 4 0
3 4 4 0
4 5 0 0
runfile
[[ 1. 1. ]
[ 2. 2. ]
[ 3. 4. ]
[ 4. 4. ]
[ 5. 2.75]]
Imputer
class 中的 statistics_
属性将 return 每列的填充值,包括删除的列。
statistics_ : array of shape (n_features,)
The imputation fill value for each feature if axis == 0.
Imp.statistics_
array([3. , 2.75, nan])
获取具有所有 "missing" 个值的列的列名的示例。
nanmask = np.isnan(Imp.statistics_)
nanmask
array([False, False, True])
X.columns[nanmask]
Index([u'b3'], dtype='object')
我正在尝试使用 Imputer 来处理缺失值。 我想跟踪所有缺失值的列,但因为否则我不知道它们中的哪些(列)已被处理: 是否可以 return 也包含所有缺失值的列?
Impute Notes
When axis=0, columns which only contained missing values at fit are discarded upon transform. When axis=1, an exception is raised if there are rows for which it is not possible to fill in the missing values (e.g., because they only contain missing values).
import pandas as pd
import numpy as np
from sklearn.preprocessing import Imputer
data={'b1':[1,2,3,4,5],'b2':[1,2,4,4,0],'b3':[0,0,0,0,0]}
X= pd.DataFrame(data)
Imp = Imputer(missing_values=0)
print (Imp.fit_transform(X))
print(X)
b1 b2 b3
0 1 1 0
1 2 2 0
2 3 4 0
3 4 4 0
4 5 0 0
runfile
[[ 1. 1. ]
[ 2. 2. ]
[ 3. 4. ]
[ 4. 4. ]
[ 5. 2.75]]
Imputer
class 中的 statistics_
属性将 return 每列的填充值,包括删除的列。
statistics_ : array of shape (n_features,)
The imputation fill value for each feature if axis == 0.
Imp.statistics_
array([3. , 2.75, nan])
获取具有所有 "missing" 个值的列的列名的示例。
nanmask = np.isnan(Imp.statistics_)
nanmask
array([False, False, True])
X.columns[nanmask]
Index([u'b3'], dtype='object')