如何使用 XGBRFRegressor 提前停止？

Question

我试过像这样拟合随机森林：

from xgboost import XGBRFRegressor
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

X, y = make_regression(random_state=7)

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=7)

forest = XGBRFRegressor(num_parallel_tree = 10, num_boost_round = 1000, verbose=3)

forest.fit(
    X_train, 
    y_train,
    eval_set = [(X_test, y_test)],
    early_stopping_rounds = 10,
    verbose = True
)

但是，提前停止似乎从未起作用，据我所知，该模型符合要求的全部 10,000 棵树。评估指标只打印一次，而不是像我预期的那样在每轮提升后打印。

设置这种类型的模型（在 scikit-learn API 中工作）的正确方法是什么，以便提前停止像我预期的那样生效？

我已在此处要求开发人员进行说明：

https://discuss.xgboost.ai/t/how-is-xgbrfregressor-intended-to-work-with-early-stopping/2391

Answer 1

The docs say:

[XGBRFRegressor has] default values and meaning of some of the parameters adjusted accordingly. In particular:

n_estimators specifies the size of the forest to be trained; it is converted to num_parallel_tree, instead of the number of boosting rounds

learning_rate is set to 1 by default

colsample_bynode and subsample are set to 0.8 by default

booster is always gbtree

您可以看到在操作中 in the code：num_parallel_trees 被覆盖为输入 n_estimators，num_boosting_rounds 被覆盖为 1。

可能值得阅读文档 link 之前的段落，以了解 xgboost 如何处理随机森林。

如何使用 XGBRFRegressor 提前停止？

How can I use early stopping with XGBRFRegressor?

python

xgboost