运行 gridsearchcv 用于 python 中的 xgboost 超参数时出错

Error while running gridsearchcv for xgboost hyperparameters in python


我有一组数据如下,提取前5行供参考

    gvkey   year    ebit_diff   cogs_diff   revt_diff   xad_diff    xint_diff   xrd_diff    xrent_diff  xsga_diff
0   1004    2011    0.007816    0.081074    0.051726    -0.02617    0.011864    -0.052201   -0.016440   -0.048060
1   1004    2012    -0.028573   0.032022    0.002105    -0.02617    0.035253    -0.052201   -0.024924   -0.050444
2   1004    2013    -0.039717   -0.080926   -0.079771   -0.02617    0.011793    -0.052201   -0.009906   -0.050436
3   1004    2014    -0.027915   -0.184351   -0.169031   -0.02617    -0.012772   -0.052201   -0.032912   -0.094717
4   1004    2015    -0.185687   -0.243326   -0.291618   -0.02617    -0.126708   -0.052201   -0.059853   -0.126411


我不需要包含 'gvkey' 和 'year' 分类变量。我做了 train_test_split 并且是 运行 xgboost 并使用 gridsearchcv 来确定最佳超参数

X_train:

    cogs_diff   revt_diff   xad_diff    xint_diff   xrd_diff    xrent_diff  xsga_diff
0   0.081074    0.051726    -0.02617    0.011864    -0.052201   -0.016440   -0.048060
1   0.032022    0.002105    -0.02617    0.035253    -0.052201   -0.024924   -0.050444
2   -0.080926   -0.079771   -0.02617    0.011793    -0.052201   -0.009906   -0.050436
3   -0.184351   -0.169031   -0.02617    -0.012772   -0.052201   -0.032912   -0.094717
4   -0.243326   -0.291618   -0.02617    -0.126708   -0.052201   -0.059853   -0.126411

cogs_diff     float64
revt_diff     float64
xad_diff      float64
xint_diff     float64
xrd_diff      float64
xrent_diff    float64
xsga_diff     float64
dtype: object


Y train

0    0.007816
1   -0.028573
2   -0.039717
3   -0.027915
4   -0.185687
Name: ebit_diff, dtype: float64

```
# 1. Set up a parameter grid for XGBoost

params = {
     "max_depth": [2, 4, 6],
     "learning_rate": [0.001, 0.05, 0.1],
     "n_estimators": [20,40,60],
     "max_features": [2,4,6]
}


# 2. Set up xgboost classifier - so that the performance metric is RMSE, not something else
xgb = XGBClassifier(eval_metric ='rmse')


# 3. Set up GridSearchCV parameters - perform 5-fold cross validation for hyperparameter tuning on this training dataset set.

start_time = time.time()

grid = GridSearchCV(estimator=xgb, param_grid=params, cv=5, scoring='roc_auc', verbose=3)
grid.fit(X_train, y_train)


但是,我遇到以下错误:

ValueError: continuous format is not supported


想问问有没有人知道如何解决这个问题?

看来你的问题是回归问题,但你实例化了一个 XGBClassifier(评估分数 rmse),其中可能应该使用 XGBRegressor

此外,您在 GridSearchCV 的构造中使用了 scoring='roc_auc',这可能会导致异常,因为 roc 曲线下的面积不适用于回归问题(这可能就是异常消息试图告诉您的内容)。