如何从 sklearn SelectKBest 获取实际选择的功能

How to get the actual selected features from sklearn SelectKBest

给定以下数据:

import pandas as pd
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_regression
import io


df = pd.read_csv(
    io.StringIO(
        "noise_0,x0,x1,y\n1.0322600657764203,10.354468012163927,7.655143584899129,168.06121374114608\n4.478935261759052,8.786243147880384,6.244283164157256,156.570749155167\n9.085955030930956,10.450548129254543,8.084427493431185,152.10261405911672\n2.9361414837367947,10.869778308219216,9.165630427431644,129.72126680171317\n2.877753385863487,11.236593954599316,5.7987616455741575,55.294961794556315\n1.3002857211827767,9.111226379916955,10.289447419679227,308.7475968288771\n0.19366957870297075,9.753313270715008,9.803181441185592,163.337342478704\n6.788355329398909,9.752270042969856,9.004988677803736,271.9442757290742\n2.1162811600005904,8.67161845864426,9.801711898528824,158.09622149503954\n2.655466593722262,8.830913103331573,6.632544281651334,316.23912914041557\n"
    )
)

看起来像:

    noise_0         x0         x1           y
0  1.032260  10.354468   7.655144  168.061214
1  4.478935   8.786243   6.244283  156.570749
2  9.085955  10.450548   8.084427  152.102614
3  2.936141  10.869778   9.165630  129.721267
4  2.877753  11.236594   5.798762   55.294962
5  1.300286   9.111226  10.289447  308.747597
6  0.193670   9.753313   9.803181  163.337342
7  6.788355   9.752270   9.004989  271.944276
8  2.116281   8.671618   9.801712  158.096221
9  2.655467   8.830913   6.632544  316.239129

并且有相关矩阵


|         |   noise_0 |        x0 |        x1 |         y |
|:--------|----------:|----------:|----------:|----------:|
| noise_0 |  1        |  0.159642 | -0.208966 | -0.02006  |
| x0      |  0.159642 |  1        | -0.197431 | -0.620964 |
| x1      | -0.208966 | -0.197431 |  1        |  0.304241 |
| y       | -0.02006  | -0.620964 |  0.304241 |  1        |

我很感兴趣如何从 sklearns 功能 selection.

中找到变量名称 x0,x1

当我尝试以下操作时:

X_new = SelectKBest(f_regression, k=2).fit(df.drop("y", axis=1), df["y"])

我期待它 select x1, x2,但不确定如何确定哪些功能实际上 select 由它编辑。

SelectKBest 提供了一个 get_support() 方法,可以显示您选择了哪些功能。

重新排列代码以保存 SelectKBest 实例:

selector = SelectKBest(f_regression, k=2)
X = df.drop("y", axis=1)
X_new = selector.fit(X, df["y"])

现在,运行 selector.get_support() 会给我们:

[False,  True,  True]

然后我们可以使用 selector.get_support() 来屏蔽 X:

的列
X.columns.values[selector.get_support()]

最终输出:

['x0', 'x1']