插补循环
Loop for imputation
我对单个变量进行了模拟,return它变成了同一个变量
X = pd.DataFrame(df, columns=['a'])
imp = Imputer(missing_values='NaN', strategy='median', axis=0)
X = imp.fit_transform(X)
df['a'] = X
但是我有很多变量并且想像这样使用循环
f = df[[a, b, c, d, e]]
for k in f:
X = pd.DataFrame(df, columns=k)
imp = Imputer(missing_values='NaN', strategy='median', axis=0)
X = imp.fit_transform(X)
df.k = X
但是:
TypeError: Index(...) must be called with a collection of some kind, 'a' was passed
如何在数据框中使用循环进行插补和 return 变量?
DataFrame 遍历它的列名,所以在这个实例中 k == 'a' 而不是第一列。你可以用
来实现它
f = df[[a, b, c, d, e]]
for k in f:
X = df[k]
imp = Imputer(missing_values='NaN', strategy='median', axis=0)
X = imp.fit_transform(X)
df[k] = X
但是您可能只想使用按列应用来构建一个新的数据框。像
df = df.apply(imp.fit_transform, raw=True, broadcast=True)
或pandas有自己的处理缺失数据的方法:http://pandas.pydata.org/pandas-docs/stable/missing_data.html#filling-with-a-pandasobject
for k in f:
X = pd.DataFrame(df, columns=[k])
imp = Imputer(missing_values='NaN', strategy='median', axis=0)
X = imp.fit_transform(X)
df[k] = X
我对单个变量进行了模拟,return它变成了同一个变量
X = pd.DataFrame(df, columns=['a'])
imp = Imputer(missing_values='NaN', strategy='median', axis=0)
X = imp.fit_transform(X)
df['a'] = X
但是我有很多变量并且想像这样使用循环
f = df[[a, b, c, d, e]]
for k in f:
X = pd.DataFrame(df, columns=k)
imp = Imputer(missing_values='NaN', strategy='median', axis=0)
X = imp.fit_transform(X)
df.k = X
但是:
TypeError: Index(...) must be called with a collection of some kind, 'a' was passed
如何在数据框中使用循环进行插补和 return 变量?
DataFrame 遍历它的列名,所以在这个实例中 k == 'a' 而不是第一列。你可以用
来实现它f = df[[a, b, c, d, e]]
for k in f:
X = df[k]
imp = Imputer(missing_values='NaN', strategy='median', axis=0)
X = imp.fit_transform(X)
df[k] = X
但是您可能只想使用按列应用来构建一个新的数据框。像
df = df.apply(imp.fit_transform, raw=True, broadcast=True)
或pandas有自己的处理缺失数据的方法:http://pandas.pydata.org/pandas-docs/stable/missing_data.html#filling-with-a-pandasobject
for k in f:
X = pd.DataFrame(df, columns=[k])
imp = Imputer(missing_values='NaN', strategy='median', axis=0)
X = imp.fit_transform(X)
df[k] = X