如何使用 sklearn RFECV select 在拟合我的估计器之前传递到降维步骤的最佳特征
How to use sklearn RFECV to select the optimal features to pass to a dimensionality reduction step before fitting my estimator
在使用 KNN 拟合我的估计器之前,我如何使用 sklearn RFECV 方法 select 传递给 LinearDiscriminantAnalysis(n_components=2) 方法进行降维的最佳特征。
pipeline = make_pipeline(Normalizer(), LinearDiscriminantAnalysis(n_components=2), KNeighborsClassifier(n_neighbors=10))
X = self.dataset
y = self.postures
min_features_to_select = 1 # Minimum number of features to consider
rfecv = RFECV(svc, step=1, cv=None, scoring='f1_weighted', min_features_to_select=min_features_to_select)
rfecv.fit(X, y)
print(rfecv.support_)
print(rfecv.ranking_)
print("Optimal number of features : %d" % rfecv.n_features_)
Plot number of features VS. cross-validation scores
plt.figure()
plt.xlabel("Number of features selected")
plt.ylabel("Cross validation score (nb of correct classifications)")
plt.plot(range(min_features_to_select,
len(rfecv.grid_scores_) + min_features_to_select),
rfecv.grid_scores_)
plt.show()
我从这段代码中得到以下错误。如果我 运行 这段代码没有 LinearDiscriminantAnalysis() 步骤那么它就可以工作,但这是我处理的重要部分。
*** ValueError: when `importance_getter=='auto'`, the underlying estimator Pipeline should have `coef_` or `feature_importances_` attribute. Either pass a fitted estimator to feature selector or call fit before calling transform.
您的方法有一个整体问题:KNeighborsClassifier
没有特征重要性的内在度量。因此,它与 RFECV
不兼容,因为它的文档说明了分类器:
A supervised learning estimator with a fit method that provides information about feature importance either through a coef_ attribute or through a feature_importances_ attribute.
你肯定会失败KNeighborsClassifier
。您肯定需要另一个分类器,例如 RandomForestClassifier
或 SVC
.
如果您可以选择另一个分类器,您的管道仍然需要在您的管道中公开估计器的特征重要性。为此,您可以在此处参考此 ,它为此目的定义了一个自定义管道:
class Mypipeline(Pipeline):
@property
def coef_(self):
return self._final_estimator.coef_
@property
def feature_importances_(self):
return self._final_estimator.feature_importances_
像这样定义您的管道:
pipeline = MyPipeline([
('normalizer', Normalizer()),
('ldm', LinearDiscriminantAnalysis(n_components=2)),
('rf', RandomForestClassifier())
])
它应该可以工作。
在使用 KNN 拟合我的估计器之前,我如何使用 sklearn RFECV 方法 select 传递给 LinearDiscriminantAnalysis(n_components=2) 方法进行降维的最佳特征。
pipeline = make_pipeline(Normalizer(), LinearDiscriminantAnalysis(n_components=2), KNeighborsClassifier(n_neighbors=10))
X = self.dataset
y = self.postures
min_features_to_select = 1 # Minimum number of features to consider
rfecv = RFECV(svc, step=1, cv=None, scoring='f1_weighted', min_features_to_select=min_features_to_select)
rfecv.fit(X, y)
print(rfecv.support_)
print(rfecv.ranking_)
print("Optimal number of features : %d" % rfecv.n_features_)
Plot number of features VS. cross-validation scores
plt.figure()
plt.xlabel("Number of features selected")
plt.ylabel("Cross validation score (nb of correct classifications)")
plt.plot(range(min_features_to_select,
len(rfecv.grid_scores_) + min_features_to_select),
rfecv.grid_scores_)
plt.show()
我从这段代码中得到以下错误。如果我 运行 这段代码没有 LinearDiscriminantAnalysis() 步骤那么它就可以工作,但这是我处理的重要部分。
*** ValueError: when `importance_getter=='auto'`, the underlying estimator Pipeline should have `coef_` or `feature_importances_` attribute. Either pass a fitted estimator to feature selector or call fit before calling transform.
您的方法有一个整体问题:KNeighborsClassifier
没有特征重要性的内在度量。因此,它与 RFECV
不兼容,因为它的文档说明了分类器:
A supervised learning estimator with a fit method that provides information about feature importance either through a coef_ attribute or through a feature_importances_ attribute.
你肯定会失败KNeighborsClassifier
。您肯定需要另一个分类器,例如 RandomForestClassifier
或 SVC
.
如果您可以选择另一个分类器,您的管道仍然需要在您的管道中公开估计器的特征重要性。为此,您可以在此处参考此
class Mypipeline(Pipeline):
@property
def coef_(self):
return self._final_estimator.coef_
@property
def feature_importances_(self):
return self._final_estimator.feature_importances_
像这样定义您的管道:
pipeline = MyPipeline([
('normalizer', Normalizer()),
('ldm', LinearDiscriminantAnalysis(n_components=2)),
('rf', RandomForestClassifier())
])
它应该可以工作。