如何使用 XGBRFRegressor 提前停止?

How can I use early stopping with XGBRFRegressor?

我试过像这样拟合随机森林:

from xgboost import XGBRFRegressor
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

X, y = make_regression(random_state=7)

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=7)

forest = XGBRFRegressor(num_parallel_tree = 10, num_boost_round = 1000, verbose=3)

forest.fit(
    X_train, 
    y_train,
    eval_set = [(X_test, y_test)],
    early_stopping_rounds = 10,
    verbose = True
)

但是,提前停止似乎从未起作用,据我所知,该模型符合要求的全部 10,000 棵树。评估指标只打印一次,而不是像我预期的那样在每轮提升后打印。

设置这种类型的模型(在 scikit-learn API 中工作)的正确方法是什么,以便提前停止像我预期的那样生效?

我已在此处要求开发人员进行说明:

https://discuss.xgboost.ai/t/how-is-xgbrfregressor-intended-to-work-with-early-stopping/2391

The docs say:

[XGBRFRegressor has] default values and meaning of some of the parameters adjusted accordingly. In particular:

  • n_estimators specifies the size of the forest to be trained; it is converted to num_parallel_tree, instead of the number of boosting rounds
  • learning_rate is set to 1 by default
  • colsample_bynode and subsample are set to 0.8 by default
  • booster is always gbtree

您可以看到在操作中 in the codenum_parallel_trees 被覆盖为输入 n_estimatorsnum_boosting_rounds 被覆盖为 1。

可能值得阅读文档 link 之前的段落,以了解 xgboost 如何处理随机森林。