使用 Pycaret 提前停止？使用 Catboost 和 XGBoost 过度拟合

Question

我正在比较 Pycaret 中 Catboost、XGBoost 和 LinearRegression 的性能。 Catboost 和 XGBoost 未调整。

到目前为止，我发现 Catboost 和 XGBoost 过拟合。

对于线性回归train/test-score是训练R2：0.72，测试R2：0.65

有没有办法为 XGBoost 和 Catboost 设置 'Early Stopping' 来避免这种过拟合？或者在 Pycaret 中是否有其他参数可以调整以避免过度拟合？

Answer 1

还有更多的可能性，如何避免过拟合。

特征选择（可以在设置中设置）-有两种类型和可变阈值OR RFE（递归特征消除）或SHAP
调整两者 - Catboost、XGBoost（或其他树算法）
增加n_estimators=100或500，或1000
运行多次算法
更改采样 80/20、70/30 等
删除相关输入

Answer 2

首先，您如何在不调整超参数的情况下比较模型？看到您的代码会有所帮助。

pycaret 中有一个提前停止参数，但我不确定它在做什么。它也仅适用于 tune_model 函数。如果您允许 pycaret 自动搜索 xgboost 和 catboost 的超参数，它们应该不会再过度拟合。这是因为他们将调整正则化超参数（叶权重上的 L1 and/or L2 正则化）并将比较验证集的分数。

使用 catboost（或 xgboost 或 lightgbm），您可以设置 early_stopping_rounds 参数以启用提前停止：

import catboost

cb = catboost.CatBoostClassifier(n_estimators=1000)
cb.fit(x_train, y_train, eval_set=(x_test, y_test), early_stopping_rounds=10, plot=True)

你需要提供eval_set，否则，它没有评估提前停止。我认为目前无法将 early_stopping_rounds 作为参数添加到您可能正在使用的任何相关 pycaret 函数中。

使用 Pycaret 提前停止？使用 Catboost 和 XGBoost 过度拟合

Early stopping with Pycaret? Overfitting with Catboost and XGBoost

python

evaluation

model

pycaret