在进行特征选择时跟踪特征名称

Question

当使用 sklearn 的 feature_selection 函数进行特征选择时，有没有办法跟踪实际的特征名称而不是默认的 "f1"、"f2" 等。 .?我有大量的功能，所以我无法手动跟踪。显然，我可以编写代码来执行此操作，但我想知道是否有一些我可以设置的简单选项。

Answer 1

如果你有一个 pandas 数据框，你可以 return 函数选择的列的名称，你只需要使用 get_support 方法。

这里有一个简单的示例，其中对官方 documentation 进行了一些修改。

import pandas as pd
from sklearn.feature_selection import SelectFromModel
from sklearn.linear_model import LogisticRegression
X = [[ 0.87, -1.34,  0.31, 0],
     [-2.79, -0.02, -0.85, 1],
     [-1.34, -0.48, -2.55, 0],
     [ 1.92,  1.48,  0.65, 1]]

df = pd.DataFrame(X, columns=['col1', 'col2', 'col3', 'label'])
train_x = df.loc[:, ['col1',  'col2', 'col3']]
y = df.label
selector = SelectFromModel(estimator=LogisticRegression()).fit(train_x, y)

col_index = selector.get_support()
print(train_x.columns[col_index])
# output print --> Index(['col2'], dtype='object')

在进行特征选择时跟踪特征名称

Keeping track of feature names when doing Feature Selection

machine-learning

feature-selection

scikit-learn