为什么我们需要再次拟合模型才能获得分数？

Question

我正在测试嵌入式方法的功能 selection。

我理解（也许我误解了）使用嵌入式方法我们可以在训练模型时获得最佳特征（基于特征的重要性）。

是这样，我想得到训练模型的分数（训练到select个特征）。

我正在使用套索方法测试分类问题。

当我尝试获取分数时，出现错误提示我需要再次拟合模型。

为什么我需要这样做（如果模型安装在特征 selection 上似乎是浪费时间？）
为什么我们不能一次性完成（select 特征并获得模型分数）？

为什么如果我们使用嵌入式方法，为什么我们需要分两个阶段进行？为什么我们不能在训练模型的同时选择一次拟合中的最佳特征？

from sklearn.linear_model import Lasso, LogisticRegression
from sklearn.feature_selection import SelectFromModel
estimator = LogisticRegression(C=1, penalty='l1', solver='liblinear')
selection = SelectFromModel(estimator)
selection.fit(x_train, y_train)
print(estimator.score(x_test, y_test))

错误：

sklearn.exceptions.NotFittedError: This LogisticRegression instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

Answer 1

拟合估计量返回为 selection.estimator_（参见 docs）；所以，在拟合 selection 之后，你可以简单地做：

selection.estimator_.score(x_test, y_test)

为什么我们需要再次拟合模型才能获得分数？

Why do we need to fit the model again in order to get score?

python

machine-learning

feature-selection

scikit-learn

data-science