为什么在 scikit-learn 中使用 make_pipeline 时会出现 'last step of pipeline' 错误?
Why am I getting 'last step of pipeline' error when using make_pipeline in scikit-learn?
所以我正在尝试使用 scikit-learn
中的 make_pipeline
来清理我的数据(替换缺失值,然后清理异常值,将编码函数应用于分类变量,然后最后添加随机Forest Regressor through RandomForestRegressor
。输入是 DataFrame
。最终我想通过 GridSearchCV
来搜索回归器的最佳超参数。
为了做到这一点,我构建了一些自定义的 classes,它们按照建议 继承了 TransformerMixin
class。这是我目前所拥有的
from sklearn.pipeline import make_pipeline
from sklearn.base import TransformerMixin
import pandas as pd
class Cleaning(TransformerMixin):
def __init__(self, column_labels):
self.column_labels = column_labels
def fit(self, X, y=None):
return self
def transform(self, X):
"""Given a dataframe X with predictors, clean it."""
X_imputed, medians_X = median_imputer(X) # impute all missing numeric data with median
quantiles_X = get_quantiles(X_imputed, self.column_labels)
X_nooutliers, _ = replace_outliers(X_imputed, self.column_labels, medians_X, quantiles_X)
return X_nooutliers
class Encoding(TransformerMixin):
def __init__(self, encoder_list):
self.encoder_list = encoder_list
def fit(self, X, y=None):
return self
def transform(self, X):
"""Takes in dataframe X and applies encoding transformation to them"""
return encode_data(self.encoder_list, X)
但是,当我 运行 下面的代码行时,我收到一个错误:
import category_encoders as ce
pipeline_cleaning = Cleaning(column_labels = train_labels)
OneHot_binary = ce.OneHotEncoder(cols = ['new_store'])
OneHot = ce.OneHotEncoder(cols= ['transport_availability'])
Count = ce.CountEncoder(cols = ['county'])
pipeline_encoding = Encoding([OneHot_binary, OneHot, Count])
baseline = RandomForestRegressor(n_estimators=500, random_state=12)
make_pipeline([pipeline_cleaning, pipeline_encoding,baseline])
错误是 Last step of Pipeline should implement fit or be the string 'passthrough'
。我不明白为什么?
编辑:最后一行有轻微错别字,更正。传递给 make_pipeline
的列表中的第三个元素是回归量
换行:
make_pipeline([pipeline_cleaning, pipeline_encoding,baseline])
至(无列表):
make_pipeline(pipeline_cleaning, pipeline_encoding,baseline)
Pipeline(steps=[('cleaning', <__main__.Cleaning object at 0x7f617260c1d0>),
('encoding', <__main__.Encoding object at 0x7f617260c278>),
('randomforestregressor',
RandomForestRegressor(n_estimators=500, random_state=12))])
你可以走了
所以我正在尝试使用 scikit-learn
中的 make_pipeline
来清理我的数据(替换缺失值,然后清理异常值,将编码函数应用于分类变量,然后最后添加随机Forest Regressor through RandomForestRegressor
。输入是 DataFrame
。最终我想通过 GridSearchCV
来搜索回归器的最佳超参数。
为了做到这一点,我构建了一些自定义的 classes,它们按照建议 TransformerMixin
class。这是我目前所拥有的
from sklearn.pipeline import make_pipeline
from sklearn.base import TransformerMixin
import pandas as pd
class Cleaning(TransformerMixin):
def __init__(self, column_labels):
self.column_labels = column_labels
def fit(self, X, y=None):
return self
def transform(self, X):
"""Given a dataframe X with predictors, clean it."""
X_imputed, medians_X = median_imputer(X) # impute all missing numeric data with median
quantiles_X = get_quantiles(X_imputed, self.column_labels)
X_nooutliers, _ = replace_outliers(X_imputed, self.column_labels, medians_X, quantiles_X)
return X_nooutliers
class Encoding(TransformerMixin):
def __init__(self, encoder_list):
self.encoder_list = encoder_list
def fit(self, X, y=None):
return self
def transform(self, X):
"""Takes in dataframe X and applies encoding transformation to them"""
return encode_data(self.encoder_list, X)
但是,当我 运行 下面的代码行时,我收到一个错误:
import category_encoders as ce
pipeline_cleaning = Cleaning(column_labels = train_labels)
OneHot_binary = ce.OneHotEncoder(cols = ['new_store'])
OneHot = ce.OneHotEncoder(cols= ['transport_availability'])
Count = ce.CountEncoder(cols = ['county'])
pipeline_encoding = Encoding([OneHot_binary, OneHot, Count])
baseline = RandomForestRegressor(n_estimators=500, random_state=12)
make_pipeline([pipeline_cleaning, pipeline_encoding,baseline])
错误是 Last step of Pipeline should implement fit or be the string 'passthrough'
。我不明白为什么?
编辑:最后一行有轻微错别字,更正。传递给 make_pipeline
的列表中的第三个元素是回归量
换行:
make_pipeline([pipeline_cleaning, pipeline_encoding,baseline])
至(无列表):
make_pipeline(pipeline_cleaning, pipeline_encoding,baseline)
Pipeline(steps=[('cleaning', <__main__.Cleaning object at 0x7f617260c1d0>),
('encoding', <__main__.Encoding object at 0x7f617260c278>),
('randomforestregressor',
RandomForestRegressor(n_estimators=500, random_state=12))])
你可以走了