如何使用 XGBRFRegressor 提前停止?
How can I use early stopping with XGBRFRegressor?
我试过像这样拟合随机森林:
from xgboost import XGBRFRegressor
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
X, y = make_regression(random_state=7)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=7)
forest = XGBRFRegressor(num_parallel_tree = 10, num_boost_round = 1000, verbose=3)
forest.fit(
X_train,
y_train,
eval_set = [(X_test, y_test)],
early_stopping_rounds = 10,
verbose = True
)
但是,提前停止似乎从未起作用,据我所知,该模型符合要求的全部 10,000 棵树。评估指标只打印一次,而不是像我预期的那样在每轮提升后打印。
设置这种类型的模型(在 scikit-learn API 中工作)的正确方法是什么,以便提前停止像我预期的那样生效?
我已在此处要求开发人员进行说明:
https://discuss.xgboost.ai/t/how-is-xgbrfregressor-intended-to-work-with-early-stopping/2391
[XGBRFRegressor has] default values and meaning of some of the parameters adjusted accordingly. In particular:
n_estimators
specifies the size of the forest to be trained; it is converted to num_parallel_tree
, instead of the number of boosting rounds
learning_rate
is set to 1 by default
colsample_bynode
and subsample
are set to 0.8 by default
booster
is always gbtree
您可以看到在操作中 in the code:num_parallel_trees
被覆盖为输入 n_estimators
,num_boosting_rounds
被覆盖为 1。
可能值得阅读文档 link 之前的段落,以了解 xgboost 如何处理随机森林。
我试过像这样拟合随机森林:
from xgboost import XGBRFRegressor
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
X, y = make_regression(random_state=7)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=7)
forest = XGBRFRegressor(num_parallel_tree = 10, num_boost_round = 1000, verbose=3)
forest.fit(
X_train,
y_train,
eval_set = [(X_test, y_test)],
early_stopping_rounds = 10,
verbose = True
)
但是,提前停止似乎从未起作用,据我所知,该模型符合要求的全部 10,000 棵树。评估指标只打印一次,而不是像我预期的那样在每轮提升后打印。
设置这种类型的模型(在 scikit-learn API 中工作)的正确方法是什么,以便提前停止像我预期的那样生效?
我已在此处要求开发人员进行说明:
https://discuss.xgboost.ai/t/how-is-xgbrfregressor-intended-to-work-with-early-stopping/2391
[XGBRFRegressor has] default values and meaning of some of the parameters adjusted accordingly. In particular:
n_estimators
specifies the size of the forest to be trained; it is converted tonum_parallel_tree
, instead of the number of boosting roundslearning_rate
is set to 1 by defaultcolsample_bynode
andsubsample
are set to 0.8 by defaultbooster
is alwaysgbtree
您可以看到在操作中 in the code:num_parallel_trees
被覆盖为输入 n_estimators
,num_boosting_rounds
被覆盖为 1。
可能值得阅读文档 link 之前的段落,以了解 xgboost 如何处理随机森林。