使用 python 的步进选择中得分 "nan" 的解决方案

Solution for "nan" for score in step forward selection using python

我正在使用 mlxtend 的顺序特征选择 (sfs) 运行 进一步特征选择。

x_train, x_test = train_test_split(x, test_size = 0.2, random_state = 0)
y_train, y_test = train_test_split(y, test_size = 0.2, random_state = 0)
sfs = SFS(RandomForestClassifier(n_estimators=100, random_state=0, n_jobs = -1),
         k_features = 28,
          forward = True,
          floating = False,
          verbose= 2,
          scoring= "r2",
          cv = 4,
          n_jobs = -1
         ).fit(x_train, y_train)

代码运行,但 returns 评分值为 NaN。

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  28 out of  28 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done  28 out of  28 | elapsed:    0.1s finished

[2021-12-30 14:15:17] Features: 1/28 -- score: nan[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  27 out of  27 | elapsed:    0.0s finished

[2021-12-30 14:15:17] Features: 2/28 -- score: nan[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  26 out of  26 | elapsed:    0.0s finished

如果您正在进行分类,则不应使用 r2 进行评分。您可以参考 the scikit learn help page 以获取用于分类或回归的指标列表。

您还应该指定您正在使用 mlxtend 中的 SequentialFeatureSelector

下面我使用了准确性并且有效:

from mlxtend.feature_selection import SequentialFeatureSelector as SFS 
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification

x, y = make_classification(n_features=50,n_informative=28)

x_train, x_test = train_test_split(x, test_size = 0.2, random_state = 0)
y_train, y_test = train_test_split(y, test_size = 0.2, random_state = 0)\

sfs = SFS(
RandomForestClassifier(),
k_features = 28,
forward = True,
floating = False,
verbose= 2,
scoring= "accuracy").fit(x_train, y_train)