使用 `GridSearchCV` 来测试完全从管道中删除一个步骤的效果?
Use `GridSearchCV` to test effect of removing a step from the pipeline entirely?
假设我正在使用 GridSearchCV
来搜索超参数,并且我也在使用 Pipeline
因为我(认为我)想要预处理我的数据:
param_grid = {
'svc__gamma': np.linspace(0.2, 1, 5)
}
pipeline = Pipeline(steps=[('scaler', StandardScaler()), ('svc', SVC())])
search = GridSearchCV(pipeline, param_grid, cv=10)
search.fit(train_x, train_y)
有没有办法检验我的假设,即包含 scaler
步骤实际上有帮助(不仅仅是删除它并重新运行)?
也就是说,有没有办法写:
param_grid = {
'svc__gamma': np.linspace(0.2, 1, 5),
'scaler': [On, Off]
}
或者我应该用不同的方式来处理这个问题吗?
您可以通过将 passthrough
传递给您的 param_grid
来做到这一点,如下所示:
param_grid = {
'svc__gamma': np.linspace(0.2, 1, 5),
'scaler': ['passthrough', StandardScaler()]
}
中所示
Individual steps may also be replaced as parameters, and non-final steps may be ignored by setting them to 'passthrough':
>>> from sklearn.pipeline import Pipeline
>>> from sklearn.svm import SVC
>>> from sklearn.decomposition import PCA
>>> from sklearn.linear_model import LogisticRegression
>>> estimators = [('reduce_dim', PCA()), ('clf', SVC())]
>>> pipe = Pipeline(estimators)
>>> param_grid = dict(reduce_dim=['passthrough', PCA(5), PCA(10)],
... clf=[SVC(), LogisticRegression()],
... clf__C=[0.1, 10, 100])
>>> grid_search = GridSearchCV(pipe, param_grid=param_grid)
假设我正在使用 GridSearchCV
来搜索超参数,并且我也在使用 Pipeline
因为我(认为我)想要预处理我的数据:
param_grid = {
'svc__gamma': np.linspace(0.2, 1, 5)
}
pipeline = Pipeline(steps=[('scaler', StandardScaler()), ('svc', SVC())])
search = GridSearchCV(pipeline, param_grid, cv=10)
search.fit(train_x, train_y)
有没有办法检验我的假设,即包含 scaler
步骤实际上有帮助(不仅仅是删除它并重新运行)?
也就是说,有没有办法写:
param_grid = {
'svc__gamma': np.linspace(0.2, 1, 5),
'scaler': [On, Off]
}
或者我应该用不同的方式来处理这个问题吗?
您可以通过将 passthrough
传递给您的 param_grid
来做到这一点,如下所示:
param_grid = {
'svc__gamma': np.linspace(0.2, 1, 5),
'scaler': ['passthrough', StandardScaler()]
}
中所示
Individual steps may also be replaced as parameters, and non-final steps may be ignored by setting them to 'passthrough':
>>> from sklearn.pipeline import Pipeline
>>> from sklearn.svm import SVC
>>> from sklearn.decomposition import PCA
>>> from sklearn.linear_model import LogisticRegression
>>> estimators = [('reduce_dim', PCA()), ('clf', SVC())]
>>> pipe = Pipeline(estimators)
>>> param_grid = dict(reduce_dim=['passthrough', PCA(5), PCA(10)],
... clf=[SVC(), LogisticRegression()],
... clf__C=[0.1, 10, 100])
>>> grid_search = GridSearchCV(pipe, param_grid=param_grid)