使用 python 的步进选择中得分 "nan" 的解决方案
Solution for "nan" for score in step forward selection using python
我正在使用 mlxtend 的顺序特征选择 (sfs) 运行 进一步特征选择。
x_train, x_test = train_test_split(x, test_size = 0.2, random_state = 0)
y_train, y_test = train_test_split(y, test_size = 0.2, random_state = 0)
sfs = SFS(RandomForestClassifier(n_estimators=100, random_state=0, n_jobs = -1),
k_features = 28,
forward = True,
floating = False,
verbose= 2,
scoring= "r2",
cv = 4,
n_jobs = -1
).fit(x_train, y_train)
代码运行,但 returns 评分值为 NaN。
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done 28 out of 28 | elapsed: 0.1s remaining: 0.0s
[Parallel(n_jobs=-1)]: Done 28 out of 28 | elapsed: 0.1s finished
[2021-12-30 14:15:17] Features: 1/28 -- score: nan[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done 27 out of 27 | elapsed: 0.0s finished
[2021-12-30 14:15:17] Features: 2/28 -- score: nan[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done 26 out of 26 | elapsed: 0.0s finished
如果您正在进行分类,则不应使用 r2
进行评分。您可以参考 the scikit learn help page 以获取用于分类或回归的指标列表。
您还应该指定您正在使用 mlxtend
中的 SequentialFeatureSelector
。
下面我使用了准确性并且有效:
from mlxtend.feature_selection import SequentialFeatureSelector as SFS
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
x, y = make_classification(n_features=50,n_informative=28)
x_train, x_test = train_test_split(x, test_size = 0.2, random_state = 0)
y_train, y_test = train_test_split(y, test_size = 0.2, random_state = 0)\
sfs = SFS(
RandomForestClassifier(),
k_features = 28,
forward = True,
floating = False,
verbose= 2,
scoring= "accuracy").fit(x_train, y_train)
我正在使用 mlxtend 的顺序特征选择 (sfs) 运行 进一步特征选择。
x_train, x_test = train_test_split(x, test_size = 0.2, random_state = 0)
y_train, y_test = train_test_split(y, test_size = 0.2, random_state = 0)
sfs = SFS(RandomForestClassifier(n_estimators=100, random_state=0, n_jobs = -1),
k_features = 28,
forward = True,
floating = False,
verbose= 2,
scoring= "r2",
cv = 4,
n_jobs = -1
).fit(x_train, y_train)
代码运行,但 returns 评分值为 NaN。
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done 28 out of 28 | elapsed: 0.1s remaining: 0.0s
[Parallel(n_jobs=-1)]: Done 28 out of 28 | elapsed: 0.1s finished
[2021-12-30 14:15:17] Features: 1/28 -- score: nan[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done 27 out of 27 | elapsed: 0.0s finished
[2021-12-30 14:15:17] Features: 2/28 -- score: nan[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done 26 out of 26 | elapsed: 0.0s finished
如果您正在进行分类,则不应使用 r2
进行评分。您可以参考 the scikit learn help page 以获取用于分类或回归的指标列表。
您还应该指定您正在使用 mlxtend
中的 SequentialFeatureSelector
。
下面我使用了准确性并且有效:
from mlxtend.feature_selection import SequentialFeatureSelector as SFS
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
x, y = make_classification(n_features=50,n_informative=28)
x_train, x_test = train_test_split(x, test_size = 0.2, random_state = 0)
y_train, y_test = train_test_split(y, test_size = 0.2, random_state = 0)\
sfs = SFS(
RandomForestClassifier(),
k_features = 28,
forward = True,
floating = False,
verbose= 2,
scoring= "accuracy").fit(x_train, y_train)