插补循环

Loop for imputation

我对单个变量进行了模拟,return它变成了同一个变量

X = pd.DataFrame(df, columns=['a'])
imp = Imputer(missing_values='NaN', strategy='median', axis=0)
X = imp.fit_transform(X)
df['a'] = X

但是我有很多变量并且想像这样使用循环

f = df[[a, b, c, d, e]]
for k in f:
    X = pd.DataFrame(df, columns=k)
    imp = Imputer(missing_values='NaN', strategy='median', axis=0)
    X = imp.fit_transform(X)
    df.k = X

但是:

TypeError: Index(...) must be called with a collection of some kind, 'a' was passed

如何在数据框中使用循环进行插补和 return 变量?

DataFrame 遍历它的列名,所以在这个实例中 k == 'a' 而不是第一列。你可以用

来实现它
f = df[[a, b, c, d, e]]
for k in f:
    X = df[k]
    imp = Imputer(missing_values='NaN', strategy='median', axis=0)
    X = imp.fit_transform(X)
    df[k] = X

但是您可能只想使用按列应用来构建一个新的数据框。像

df = df.apply(imp.fit_transform, raw=True, broadcast=True)

或pandas有自己的处理缺失数据的方法:http://pandas.pydata.org/pandas-docs/stable/missing_data.html#filling-with-a-pandasobject

for k in f:
    X = pd.DataFrame(df, columns=[k])
    imp = Imputer(missing_values='NaN', strategy='median', axis=0)
    X = imp.fit_transform(X)
    df[k] = X