'MultiOutputClassifier' 创建管道时对象不可迭代 (Python)
'MultiOutputClassifier' object is not iterable when creating a Pipeline (Python)
我想创建一个继续编码的管道,然后缩放 xgboost 分类器以解决多标签问题。
代码块;
# Create a boolean mask for categorical columns
categorical_columns = X.columns[X.dtypes == 'O'].tolist()
#Distinct columns for to find catagories
unique_list = [X[c].unique().tolist() for c in categorical_columns]
# Create a boolean mask for numerical columns
numerical_columns = X.columns[X.dtypes != 'O'].tolist()
#Encoding & Scaling objects
scaler = StandardScaler()
ohe = OneHotEncoder(categories=unique_list, sparse=False)
#Define a pipeline
pipeline = Pipeline([("ohe_onestep", ohe.fit_transform(X[categorical_columns])),
("scaler_onestep", scaler.fit_transform(X[numerical_columns])),
MultiOutputClassifier(xgb.XGBClassifier(objective='binary:logistic'))])
# Cross-validate the model
cross_val_scores = cross_val_score(pipeline, X, y,
scoring='accuracy', cv=5)
但是当我运行代码出现这个错误;
行是;
> pipeline = Pipeline([("ohe_onestep", ohe.fit_transform(X[categorical_columns])),
'MultiOutputClassifier' object is not iterable
我该如何解决这个问题?
两件事:首先,您需要将转换器或估算器 本身 传递给管道,而不是 fitting/transforming 它们的结果(这将给出结果数组到管道而不是变压器,它会失败)。管道本身将是 fitting/transforming。其次,由于您对特定列进行了特定转换,因此需要 ColumnTransformer
。
将这些放在一起:
from sklearn.compose import ColumnTransformer
col_transformers = ColumnTransformer([
# name, transformer itself, columns to apply
("scaler_onestep", scaler, numerical_columns),
("ohe_onestep", ohe, categorical_columns)])
model = MultiOutputClassifier(xgb.XGBClassifier(objective="binary:logistic"))
pipeline = Pipeline([("preprocessing", col_transformers), ("XGB", model)])
现在你可以做
cross_val_scores = cross_val_score(pipeline, X, y,
scoring="accuracy", cv=5)
另外,通常您可以使用 make_column_selector
with dtype
option to let it infer the numericals and categoricals as exemplified here.
最后,你得到错误的原因:Pipeline
需要一个元组列表。您确实为前两项传递了元组,即 scaler
和 ohe
,但您没有将 (<name>, model)
元组作为第三项传递;相反,你直接给它模型,它试图迭代它来获取这些名称等,但失败了。
我想创建一个继续编码的管道,然后缩放 xgboost 分类器以解决多标签问题。 代码块;
# Create a boolean mask for categorical columns
categorical_columns = X.columns[X.dtypes == 'O'].tolist()
#Distinct columns for to find catagories
unique_list = [X[c].unique().tolist() for c in categorical_columns]
# Create a boolean mask for numerical columns
numerical_columns = X.columns[X.dtypes != 'O'].tolist()
#Encoding & Scaling objects
scaler = StandardScaler()
ohe = OneHotEncoder(categories=unique_list, sparse=False)
#Define a pipeline
pipeline = Pipeline([("ohe_onestep", ohe.fit_transform(X[categorical_columns])),
("scaler_onestep", scaler.fit_transform(X[numerical_columns])),
MultiOutputClassifier(xgb.XGBClassifier(objective='binary:logistic'))])
# Cross-validate the model
cross_val_scores = cross_val_score(pipeline, X, y,
scoring='accuracy', cv=5)
但是当我运行代码出现这个错误; 行是;
> pipeline = Pipeline([("ohe_onestep", ohe.fit_transform(X[categorical_columns])),
'MultiOutputClassifier' object is not iterable
我该如何解决这个问题?
两件事:首先,您需要将转换器或估算器 本身 传递给管道,而不是 fitting/transforming 它们的结果(这将给出结果数组到管道而不是变压器,它会失败)。管道本身将是 fitting/transforming。其次,由于您对特定列进行了特定转换,因此需要 ColumnTransformer
。
将这些放在一起:
from sklearn.compose import ColumnTransformer
col_transformers = ColumnTransformer([
# name, transformer itself, columns to apply
("scaler_onestep", scaler, numerical_columns),
("ohe_onestep", ohe, categorical_columns)])
model = MultiOutputClassifier(xgb.XGBClassifier(objective="binary:logistic"))
pipeline = Pipeline([("preprocessing", col_transformers), ("XGB", model)])
现在你可以做
cross_val_scores = cross_val_score(pipeline, X, y,
scoring="accuracy", cv=5)
另外,通常您可以使用 make_column_selector
with dtype
option to let it infer the numericals and categoricals as exemplified here.
最后,你得到错误的原因:Pipeline
需要一个元组列表。您确实为前两项传递了元组,即 scaler
和 ohe
,但您没有将 (<name>, model)
元组作为第三项传递;相反,你直接给它模型,它试图迭代它来获取这些名称等,但失败了。