如何使用sklearn从RFE中获取系数？

Question

我正在使用递归特征估计 (RFE) 进行特征选择。这是通过迭代地采用诸如 SVM 分类器之类的估计器，将其拟合到数据并删除具有最低权重（系数）的特征来工作的。

我能够将其与数据相匹配并执行特征选择。但是，我想从 RFE 中恢复每个特征的学习权重。

我使用以下代码初始化分类器对象和 RFE 对象，并将它们与数据相匹配。

svc = SVC(C=1, kernel="linear")
rfe = RFE(estimator=svc, n_features_to_select=300, step=0.1)
rfe.fit(all_training, training_labels)

然后我尝试打印系数

print ('coefficients',svc.coef_)

并收到：

AttributeError: 'RFE' object has no attribute 'dual_coef_'

根据sklearn documentation，分类器对象应该有这个属性：

coef_ : array, shape = [n_class-1, n_features]
Weights assigned to the features (coefficients in the primal problem). This  is only 
available in the case of a linear kernel.
coef_ is a readonly property derived from dual_coef_ and support_vectors_.

我使用的是线性内核，所以这不是问题。

谁能解释一下为什么我无法恢复系数？有解决办法吗？

Answer 1

发布 2 分钟后，我再次查看了 RFE 的文档并找到了部分解决方案。

RFE 对象将估算器对象作为属性。因此我可以调用

print ('coefficients',rfe.estimator_.coef_)

并获取所选特征的系数。（即这个 returns 前 300 个特征的系数，因为我之前设置了 n_features_to_select=300）。

但是，我仍然无法获得其余未选择特征的系数。对于 RFE 的每次迭代，它都会训练分类器并为每个特征获取新的系数。理想情况下，我想访问在每次迭代中学习的系数。

（因此，如果我从 3000 个特征开始，并使用步长 300 个特征，第一次迭代我想要访问 3000 个系数，下一次迭代我想要 2700 个系数用于剩余的 2700 个特征，第三次迭代我想要访问 2400 个系数等）

Answer 2

from sklearn.linear_model import LogisticRegression

from sklearn.feature_selection import RFE

reg = LogisticRegression()

rfe = RFE(reg, no of features u want to select)

rfe.fit(X, Y)

print(rfe.support_)

您将了解哪些功能很重要，并且这是一种更好的查看方式。

如何使用sklearn从RFE中获取系数？

How to get the coefficients from RFE using sklearn?

python

machine-learning

feature-selection

scikit-learn

rfe