scikit中LinearSVC缩减后如何获取选中的特征

Question

标题说明了一切，我已经查看了 scikit docs, which are very poor for this particular task, and I have checked several online resources, including this post.

然而，他们似乎错了。对于特征选择，我们可以这样做：

clf=LinearSVC(penalty="l1",dual=False,random_state=0)
X_reduced = clf.fit_transform(X_full,y_full)

现在，如果我们检查 X_reduced 的形状，就会非常清楚选择了多少特征。那么现在的问题是，哪些？

LinearSVC的coef_属性非常重要，建议对其进行迭代，选择coef_不为零的特征。好吧，这是错误的，但你可以得到非常接近真实结果的结果。

检查 X_reduced 后，我注意到我选择了 310 个特征，这是肯定的，我的意思是，我正在检查结果矩阵，现在，如果我执行 coef_ 操作，从总共2000个特征中选取了414个特征，接近真实

根据 scikit LinearSVC docs Threshold=None 涉及 mean(X) 但我卡住了，不知道现在该做什么。

UPDATE：这是一个 link，其中包含重现错误的数据和代码，它只有几 KB

Answer 1

我认为 LinearSVC() 确实 returns 具有非零系数的特征。您能否上传可以重现您看到的不一致的示例数据文件和代码脚本（例如，通过 dropbox 共享链接）？

from sklearn.datasets import make_classification
from sklearn.datasets import load_svmlight_file
from sklearn.svm import LinearSVC
import numpy as np

X, y = load_svmlight_file("/home/Jian/Downloads/errorScikit/weirdData")

transformer = LinearSVC(penalty='l1', dual=False, random_state=0)
transformer.fit(X, y)
# set threshold eps
X_reduced = transformer.transform(X, threshold=np.finfo(np.float).eps)

print(str(X_reduced.shape[1]) + " is NOW equal to " + str((transformer.coef_ != 0).sum()))

414 is NOW equal to 414


# as suggested by user3914041, if you want both sides are 310
transformer.transform(X).shape

Out[46]: (62, 310)

(abs(transformer.coef_) > 1e-5).sum()

Out[47]: 310

scikit中LinearSVC缩减后如何获取选中的特征

How to get the selected features after LinearSVC reduction in scikit

svm

feature-selection

scikit-learn