Gridsearch 给出 AUC 分数的 nan 值
Gridsearch giving nan values for AUC score
我尝试 运行 在具有 AUC 分数 的随机森林分类器上进行网格搜索。
这是我的代码:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.metrics import make_scorer, roc_auc_score
estimator = RandomForestClassifier()
scoring = {'auc': make_scorer(roc_auc_score, multi_class="ovr")}
kfold = RepeatedStratifiedKFold(n_splits=3, n_repeats=10, random_state=42)
grid_search = GridSearchCV(estimator=estimator, param_grid=param_grid,
cv=kfold, n_jobs=-1, scoring=scoring)
grid_search.fit(X, y)
然而,当我 运行 这时,我得到 nan 的 AUC 分数值和以下警告:
UserWarning,
/opt/conda/lib/python3.7/site-packages/sklearn/model_selection/_validation.py:687: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 674, in _score
scores = scorer(estimator, X_test, y_test)
File "/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_scorer.py", line 88, in __call__
*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_scorer.py", line 243, in _score
**self._kwargs)
File "/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_ranking.py", line 538, in roc_auc_score
multi_class, average, sample_weight)
File "/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_ranking.py", line 595, in _multiclass_roc_auc_score
if not np.allclose(1, y_score.sum(axis=1)):
File "/opt/conda/lib/python3.7/site-packages/numpy/core/_methods.py", line 47, in _sum
return umr_sum(a, axis, dtype, out, keepdims, initial, where)
numpy.AxisError: axis 1 is out of bounds for array of dimension 1
UserWarning,
/opt/conda/lib/python3.7/site-packages/sklearn/model_selection/_search.py:921: UserWarning: One or more of the test scores are non-finite: [nan nan nan ... nan nan nan]
category=UserWarning
/opt/conda/lib/python3.7/site-packages/sklearn/model_selection/_search.py:921: UserWarning: One or more of the train scores are non-finite: [nan nan nan ... nan nan nan]
category=UserWarning
我真的想通了。我需要在 make_scorer 函数中将 needs_proba 设置为 True ,所以gridsearch 不会尝试直接根据我的估算器的(分类)预测来计算 auc 分数。
scoring = {'auc': make_scorer(roc_auc_score, needs_proba=True, multi_class="ovr")}
我尝试 运行 在具有 AUC 分数 的随机森林分类器上进行网格搜索。
这是我的代码:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.metrics import make_scorer, roc_auc_score
estimator = RandomForestClassifier()
scoring = {'auc': make_scorer(roc_auc_score, multi_class="ovr")}
kfold = RepeatedStratifiedKFold(n_splits=3, n_repeats=10, random_state=42)
grid_search = GridSearchCV(estimator=estimator, param_grid=param_grid,
cv=kfold, n_jobs=-1, scoring=scoring)
grid_search.fit(X, y)
然而,当我 运行 这时,我得到 nan 的 AUC 分数值和以下警告:
UserWarning,
/opt/conda/lib/python3.7/site-packages/sklearn/model_selection/_validation.py:687: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 674, in _score
scores = scorer(estimator, X_test, y_test)
File "/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_scorer.py", line 88, in __call__
*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_scorer.py", line 243, in _score
**self._kwargs)
File "/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_ranking.py", line 538, in roc_auc_score
multi_class, average, sample_weight)
File "/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_ranking.py", line 595, in _multiclass_roc_auc_score
if not np.allclose(1, y_score.sum(axis=1)):
File "/opt/conda/lib/python3.7/site-packages/numpy/core/_methods.py", line 47, in _sum
return umr_sum(a, axis, dtype, out, keepdims, initial, where)
numpy.AxisError: axis 1 is out of bounds for array of dimension 1
UserWarning,
/opt/conda/lib/python3.7/site-packages/sklearn/model_selection/_search.py:921: UserWarning: One or more of the test scores are non-finite: [nan nan nan ... nan nan nan]
category=UserWarning
/opt/conda/lib/python3.7/site-packages/sklearn/model_selection/_search.py:921: UserWarning: One or more of the train scores are non-finite: [nan nan nan ... nan nan nan]
category=UserWarning
我真的想通了。我需要在 make_scorer 函数中将 needs_proba 设置为 True ,所以gridsearch 不会尝试直接根据我的估算器的(分类)预测来计算 auc 分数。
scoring = {'auc': make_scorer(roc_auc_score, needs_proba=True, multi_class="ovr")}