GridSearchCV:评分不使用选择的 XGBRegressor 评分方法
GridSearchCV: Scoring does not use the chosen XGBRegressor score method
Scikit-learn GridSearchCV 用于 XGBRegressor 模型的超参数调整。与 XGBRegressor().fit() 中指定的 eval_metric 无关,GridSearchCV 生成相同的分值。在 https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html 上,它表示参数 scoring: "If None, the estimator’s score method is used." 这不会发生。始终获得相同的价值。 如何获得与 XGBRegressor eval_metric 对应的结果?
此示例代码:
import numpy as np
from sklearn.model_selection import GridSearchCV, KFold
from sklearn.datasets import load_boston
import xgboost as xgb
rng = np.random.RandomState(31337)
boston = load_boston()
y = boston['target']
X = boston['data']
kf = KFold(n_splits=2, random_state=42)
folds = list(kf.split(X))
xgb_model = xgb.XGBRegressor(objective='reg:squarederror', verbose=False)
reg = GridSearchCV(estimator=xgb_model,
param_grid= {'max_depth': [2], 'n_estimators': [50]},
cv=folds,
verbose=False)
reg.fit(X, y, **{'eval_metric': 'mae', 'verbose': False})
print('GridSearchCV mean(mae)?: ', reg.cv_results_['mean_test_score'])
# -----------------------------------------------
reg.fit(X, y, **{'eval_metric': 'rmse', 'verbose': False})
print('GridSearchCV mean(rmse)?: ', reg.cv_results_['mean_test_score'])
print("----------------------------------------------------")
xgb_model.set_params(**{'max_depth': 2, 'n_estimators': 50})
xgb_model.fit(X[folds[0][0],:],y[folds[0][0]], eval_metric='mae',
eval_set = [(X[folds[0][0],:],y[folds[0][0]])], verbose=False)
print('XGBRegressor 0-mae:', xgb_model.evals_result()['validation_0']['mae'][-1])
xgb_model.fit(X[folds[0][1],:],y[folds[0][1]], eval_metric='mae',
eval_set = [(X[folds[0][1],:],y[folds[0][1]])], verbose=False)
print('XGBRegressor 1-mae:', xgb_model.evals_result()['validation_0']['mae'][-1])
xgb_model.fit(X[folds[0][0],:],y[folds[0][0]], eval_metric='rmse',
eval_set = [(X[folds[0][0],:],y[folds[0][0]])], verbose=False)
print('XGBRegressor 0-rmse:', xgb_model.evals_result()['validation_0']['rmse'][-1])
xgb_model.fit(X[folds[0][1],:],y[folds[0][1]], eval_metric='rmse',
eval_set = [(X[folds[0][1],:],y[folds[0][1]])], verbose=False)
print('XGBRegressor 1-rmse:', xgb_model.evals_result()['validation_0']['rmse'][-1])
returns(线以上的数字应该是线以下的数字的平均值)
GridSearchCV mean(mae)?: [0.70941007]
GridSearchCV mean(rmse)?: [0.70941007]
----------------------------------------------------
XGBRegressor 0-mae: 1.273626
XGBRegressor 1-mae: 1.004947
XGBRegressor 0-rmse: 1.647694
XGBRegressor 1-rmse: 1.290872
TL;DR:您返回的是所谓的 R2
或决定系数。这是 XGBRegressor
score
函数的默认评分指标,由 GridSearchCV
if scoring=None
选择
比较显式编码的结果scoring
:
from sklearn.metrics import make_scorer, r2_score, mean_squared_error
xgb_model = xgb.XGBRegressor(objective='reg:squarederror', verbose=False)
reg = GridSearchCV(estimator=xgb_model, scoring=make_scorer(r2_score),
param_grid= {'max_depth': [2], 'n_estimators': [50]},
cv=folds,
verbose=False)
reg.fit(X, y)
reg.best_score_
0.7333542105472226
那些 scoring=None
:
reg = GridSearchCV(estimator=xgb_model, scoring=None,
param_grid= {'max_depth': [2], 'n_estimators': [50]},
cv=folds,
verbose=False)
reg.fit(X, y)
reg.best_score_
0.7333542105472226
如果你阅读 GridSearchCV
docstrings :
estimator : estimator object.
This is assumed to implement the scikit-learn estimator interface.
Either estimator needs to provide a score
function,
or scoring
must be passed.
此时您可能想要查看文档 xgb_model.score?
:
Signature: xgb_model.score(X, y, sample_weight=None)
Docstring:
Return the coefficient of determination R^2 of the prediction.
因此,在这些文档的帮助下,如果您不喜欢 XGBRegressor
的默认 R2
评分函数,请将您的评分函数明确提供给 GridSearchCV
例如如果你想要 RMSE
你可以这样做:
reg = GridSearchCV(estimator=xgb_model,
scoring=make_scorer(mean_squared_error, squared=False),
param_grid= {'max_depth': [2], 'n_estimators': [50]},
cv=folds,
verbose=False)
reg.fit(X, y)
reg.best_score_
4.618242594168436
Scikit-learn GridSearchCV 用于 XGBRegressor 模型的超参数调整。与 XGBRegressor().fit() 中指定的 eval_metric 无关,GridSearchCV 生成相同的分值。在 https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html 上,它表示参数 scoring: "If None, the estimator’s score method is used." 这不会发生。始终获得相同的价值。 如何获得与 XGBRegressor eval_metric 对应的结果?
此示例代码:
import numpy as np
from sklearn.model_selection import GridSearchCV, KFold
from sklearn.datasets import load_boston
import xgboost as xgb
rng = np.random.RandomState(31337)
boston = load_boston()
y = boston['target']
X = boston['data']
kf = KFold(n_splits=2, random_state=42)
folds = list(kf.split(X))
xgb_model = xgb.XGBRegressor(objective='reg:squarederror', verbose=False)
reg = GridSearchCV(estimator=xgb_model,
param_grid= {'max_depth': [2], 'n_estimators': [50]},
cv=folds,
verbose=False)
reg.fit(X, y, **{'eval_metric': 'mae', 'verbose': False})
print('GridSearchCV mean(mae)?: ', reg.cv_results_['mean_test_score'])
# -----------------------------------------------
reg.fit(X, y, **{'eval_metric': 'rmse', 'verbose': False})
print('GridSearchCV mean(rmse)?: ', reg.cv_results_['mean_test_score'])
print("----------------------------------------------------")
xgb_model.set_params(**{'max_depth': 2, 'n_estimators': 50})
xgb_model.fit(X[folds[0][0],:],y[folds[0][0]], eval_metric='mae',
eval_set = [(X[folds[0][0],:],y[folds[0][0]])], verbose=False)
print('XGBRegressor 0-mae:', xgb_model.evals_result()['validation_0']['mae'][-1])
xgb_model.fit(X[folds[0][1],:],y[folds[0][1]], eval_metric='mae',
eval_set = [(X[folds[0][1],:],y[folds[0][1]])], verbose=False)
print('XGBRegressor 1-mae:', xgb_model.evals_result()['validation_0']['mae'][-1])
xgb_model.fit(X[folds[0][0],:],y[folds[0][0]], eval_metric='rmse',
eval_set = [(X[folds[0][0],:],y[folds[0][0]])], verbose=False)
print('XGBRegressor 0-rmse:', xgb_model.evals_result()['validation_0']['rmse'][-1])
xgb_model.fit(X[folds[0][1],:],y[folds[0][1]], eval_metric='rmse',
eval_set = [(X[folds[0][1],:],y[folds[0][1]])], verbose=False)
print('XGBRegressor 1-rmse:', xgb_model.evals_result()['validation_0']['rmse'][-1])
returns(线以上的数字应该是线以下的数字的平均值)
GridSearchCV mean(mae)?: [0.70941007]
GridSearchCV mean(rmse)?: [0.70941007]
----------------------------------------------------
XGBRegressor 0-mae: 1.273626
XGBRegressor 1-mae: 1.004947
XGBRegressor 0-rmse: 1.647694
XGBRegressor 1-rmse: 1.290872
TL;DR:您返回的是所谓的 R2
或决定系数。这是 XGBRegressor
score
函数的默认评分指标,由 GridSearchCV
if scoring=None
比较显式编码的结果scoring
:
from sklearn.metrics import make_scorer, r2_score, mean_squared_error
xgb_model = xgb.XGBRegressor(objective='reg:squarederror', verbose=False)
reg = GridSearchCV(estimator=xgb_model, scoring=make_scorer(r2_score),
param_grid= {'max_depth': [2], 'n_estimators': [50]},
cv=folds,
verbose=False)
reg.fit(X, y)
reg.best_score_
0.7333542105472226
那些 scoring=None
:
reg = GridSearchCV(estimator=xgb_model, scoring=None,
param_grid= {'max_depth': [2], 'n_estimators': [50]},
cv=folds,
verbose=False)
reg.fit(X, y)
reg.best_score_
0.7333542105472226
如果你阅读 GridSearchCV
docstrings :
estimator : estimator object. This is assumed to implement the scikit-learn estimator interface. Either estimator needs to provide a
score
function, orscoring
must be passed.
此时您可能想要查看文档 xgb_model.score?
:
Signature: xgb_model.score(X, y, sample_weight=None)
Docstring:
Return the coefficient of determination R^2 of the prediction.
因此,在这些文档的帮助下,如果您不喜欢 XGBRegressor
的默认 R2
评分函数,请将您的评分函数明确提供给 GridSearchCV
例如如果你想要 RMSE
你可以这样做:
reg = GridSearchCV(estimator=xgb_model,
scoring=make_scorer(mean_squared_error, squared=False),
param_grid= {'max_depth': [2], 'n_estimators': [50]},
cv=folds,
verbose=False)
reg.fit(X, y)
reg.best_score_
4.618242594168436