在 sklearn 中将参数传递给管道的 fit()

Passing parameters to a pipeline's fit() in sklearn

我有一个 sklearn 管道,其中 PolynomialFeatures()LinearRegression() 串联。我的目标是使用多项式特征的不同 degree 来拟合数据并测量分数。以下是我使用的代码-

steps = [('polynomials',preprocessing.PolynomialFeatures()),('linreg',linear_model.LinearRegression())]
pipeline = pipeline.Pipeline(steps=steps)

scores = dict()
for i in range(2,6):
    params = {'polynomials__degree': i,'polynomials__include_bias': False}
    #pipeline.set_params(**params)
    pipeline.fit(X_train,y=yCO_logTrain,**params)
    scores[i] = pipeline.score(X_train,yCO_logTrain)

scores

我收到错误 - TypeError: fit() got an unexpected keyword argument 'degree'

为什么即使参数以 <estimator_name>__<parameter_name> 格式命名,也会抛出此错误?

根据sklearn.pipeline.Pipeline documentation

**fit_paramsdict of string -> object Parameters passed to the fit method of each step, where each parameter name is prefixed such that parameter p for step s has key s__p.

意思是这样传递的参数直接传递给s.fit()方法。如果您检查 PolynomialFeatures documentationdegree 参数用于构建 PolynomialFeatures 对象,而不是其 .fit() 方法。

如果您想在管道中为 estimators/transformators 尝试不同的超参数,您可以使用 GridSearchCV as shown here。这是来自 link:

的示例代码
from sklearn.pipeline import Pipeline
from sklearn.feature_selection import SelectKBest
pipe = Pipeline([
    ('select', SelectKBest()),
    ('model', calibrated_forest)])
param_grid = {
    'select__k': [1, 2],
    'model__base_estimator__max_depth': [2, 4, 6, 8]}
search = GridSearchCV(pipe, param_grid, cv=5).fit(X, y)