为 XGBoost 提前停止设置工具

Question

我正在使用带提前停止功能的 XGBoost。大约 1000 个 epoch 后，模型仍在改进，但改进幅度很低。即：

 clf = xgb.train(params, dtrain, num_boost_round=num_rounds, evals=watchlist, early_stopping_rounds=10)

是否可以设置一个"tol"提前停止？即：不触发提前停止所需的最低改进水平。

Tol 是 SKLearn 模型中的常用参数，例如 MLPClassifier 和 QuadraticDiscriminantAnalysis。谢谢你。

Answer 1

我认为 xgboost 中没有参数 tol，但您可以将 early_stopping_round 设置得更高。这个参数意味着如果测试集的性能在 early_stopping_round 次后没有提高，那么它就会停止。如果您知道在 1000 个 epoch 之后您的模型仍在改进但非常缓慢，请将 early_stopping_round 设置为 50 例如，这样它会更多 "tolerante" 关于性能的小变化。

Answer 2

issue 在 XGBoost Github 的 repo 中仍然是开放的，所以即使 sklearn 和 h2o 等包装器似乎已经具有此功能， xgboost 本身仍然缺少 stopping_tolerance 超参数...

让我们投票 it here 来加快速度，好吗？

Answer 3

这个选项has been implemented。

简单地传递一个值给tolerance:

    early_stop = xgb.callback.EarlyStopping(tolerance=1e5)

    booster = xgb.train(
        {'objective': 'binary:logistic',
         'eval_metric': ['error', 'rmse']},
        D_train,
        evals=[(D_train, 'Train'), (D_valid, 'Valid')],
        callbacks=[early_stop],
        )

为 XGBoost 提前停止设置工具

Setting Tol for XGBoost Early Stopping

python

machine-learning

xgboost