return 来自 sklearn 管道对象的系数

Question

我已经用 RandomizedSearchCV

安装了一个 Pipeline 对象

pipe_sgd = Pipeline([('scl', StandardScaler()),
                    ('clf', SGDClassifier(n_jobs=-1))])

param_dist_sgd = {'clf__loss': ['log'],
                 'clf__penalty': [None, 'l1', 'l2', 'elasticnet'],
                 'clf__alpha': np.linspace(0.15, 0.35),
                 'clf__n_iter': [3, 5, 7]}

sgd_randomized_pipe = RandomizedSearchCV(estimator = pipe_sgd, 
                                         param_distributions=param_dist_sgd, 
                                         cv=3, n_iter=30, n_jobs=-1)

sgd_randomized_pipe.fit(X_train, y_train)

我想访问 best_estimator_ 的 coef_ 属性，但我做不到。我试过使用下面的代码访问 coef_。

sgd_randomized_pipe.best_estimator_.coef_

但是我得到以下 AttributeError...

AttributeError: 'Pipeline' 对象没有属性 'coef_'

scikit-learn 文档说 coef_ 是 SGDClassifier 的一个属性，也就是我的 base_estimator_ 的 class。

我做错了什么？

Answer 1

我发现一种方法是使用 steps 属性进行链式索引...

sgd_randomized_pipe.best_estimator_.steps[1][1].coef_

这是最佳做法，还是有其他方法？

Answer 2

在使用 named_steps 字典创建管道时，您始终可以使用分配给它们的名称。

scaler = sgd_randomized_pipe.best_estimator_.named_steps['scl']
classifier = sgd_randomized_pipe.best_estimator_.named_steps['clf']

然后访问所有可用于相应拟合估计器的属性，如coef_、intercept_等。

这是管道公开的正式属性 specified in the documentation:

named_steps : dict

Read-only attribute to access any step parameter by user given name. Keys are step names and values are steps parameters.

Answer 3

我认为这应该可行：

sgd_randomized_pipe.named_steps['clf'].coef_

return 来自 sklearn 管道对象的系数

return coefficients from Pipeline object in sklearn

python

scikit-learn

cross-validation

scikit-learn-pipeline