我们如何使用 scikit-learn 了解选中和省略的特征（列）名称（header）

Question

我用一段数据来解释场景：

例如。数据集。

GA_ID   PN_ID   PC_ID   MBP_ID  GR_ID   AP_ID   class
0.033   6.652   6.681   0.194   0.874   3.177     0
0.034   9.039   6.224   0.194   1.137   0         0
0.035   10.936  10.304  1.015   0.911   4.9       1
0.022   10.11   9.603   1.374   0.848   4.566     1
0.035   2.963   17.156  0.599   0.823   9.406     1
0.033   10.872  10.244  1.015   0.574   4.871     1
0.035   21.694  22.389  1.015   0.859   9.259     1
0.035   10.936  10.304  1.015   0.911   4.9       1
0.035   10.936  10.304  1.015   0.911   4.9       1
0.035   10.936  10.304  1.015   0.911   4.9       0
0.036   1.373   12.034  0.35    0.259   5.723     0
0.033   9.831   9.338   0.35    0.919   4.44      0

特征选择第 1 步及其结果：VarianceThreshol

     PN_ID  PC_ID   MBP_ID  GR_ID   AP_ID   class
    6.652   6.681   0.194   0.874   3.177     0
    9.039   6.224   0.194   1.137   0         0
    10.936  10.304  1.015   0.911   4.9       1
    10.11   9.603   1.374   0.848   4.566     1
    2.963   17.156  0.599   0.823   9.406     1
    10.872  10.244  1.015   0.574   4.871     1
    21.694  22.389  1.015   0.859   9.259     1
    10.936  10.304  1.015   0.911   4.9       1
    10.936  10.304  1.015   0.911   4.9       1
    10.936  10.304  1.015   0.911   4.9       0
    1.373   12.034  0.35    0.259   5.723     0
    9.831   9.338   0.35    0.919   4.44      0

特征选择第 2 步及其结果：Tree-based 特征选择（例如来自 klearn.ensemble 导入 ExtraTreesClassifier）

PN_ID   MBP_ID  GR_ID   AP_ID   class
6.652   0.194   0.874   3.177     0
9.039   0.194   1.137   0         0
10.936  1.015   0.911   4.9       1
10.11   1.374   0.848   4.566     1
2.963   0.599   0.823   9.406     1
10.872  1.015   0.574   4.871     1
21.694  1.015   0.859   9.259     1
10.936  1.015   0.911   4.9       1
10.936  1.015   0.911   4.9       1
10.936  1.015   0.911   4.9       0
1.373   0.35    0.259   5.723     0
9.831   0.35    0.919   4.44      0

这里我们可以得出结论，我们从 6 列（特征）和一个 class 标签开始，在最后一步将其减少到 4 个特征和一个 class 标签。 GA_ID 和 PC_ID 列已被删除，而模型已使用 PN_ID、MBP_ID、GR_ID 和 AP_ID 特征构建。

但不幸的是，当我使用 scikit-learn 库中的可用方法执行特征选择时，我发现它 returns 只有数据的形状和减少的数据，没有所选和省略的特征的名称。

我写下了许多愚蠢的 python 代码（因为我不是很有经验的程序员）来找到答案但没有成功。

请给我一些摆脱困境的方法，谢谢

（注意：特别是对于这个 post 我从来没有对给定的示例数据集执行任何特征选择方法，而是随机删除了列来解释案例）

Answer 1

也许这段代码和注释解释会有所帮助（改编自 here）。

import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import make_classification
from sklearn.ensemble import ExtraTreesClassifier

# Build a classification task using 3 informative features
X, y = make_classification(n_samples=1000,
                           n_features=10,
                           n_informative=3,
                           n_redundant=0,
                           n_repeated=0,
                           n_classes=2,
                           random_state=0,
                           shuffle=False)

# Build a forest and compute the feature importances
forest = ExtraTreesClassifier(n_estimators=250,
                              random_state=0)


forest.fit(X, y)

importances = forest.feature_importances_ #array with importances of each feature

idx = np.arange(0, X.shape[1]) #create an index array, with the number of features

features_to_keep = idx[importances > np.mean(importances)] #only keep features whose importance is greater than the mean importance
#should be about an array of size 3 (about)
print features_to_keep.shape

x_feature_selected = X[:,features_to_keep] #pull X values corresponding to the most important features

print x_feature_selected

我们如何使用 scikit-learn 了解选中和省略的特征（列）名称（header）

how can we get to know the selected and omitted features (columns ) names (header) using scikit-learn

python

machine-learning

scikit-learn

scikit-image