R Xgboost 验证错误作为停止指标
R Xgboost validation error as stopping metric
我在 xgboost 二元分类模型上使用训练和验证数据集。
params5 <- list(booster = "gbtree", objective = "binary:logistic",
eta=0.0001, gamma=0.5, max_depth=15, min_child_weight=1, subsample=0.6,
colsample_bytree=0.4,seed =2222)
xgb_MOD5 <- xgb.train (params = params5, data = dtrain, nrounds = 4000,
watchlist = list(validation = dvalid,train = dtrain),
print_every_n =30,early_stopping_rounds = 100
maximize = F ,serialize = TRUE)
它会自动选择训练误差作为停止指标。这导致模型在过度拟合的同时继续训练。
Multiple eval metrics are present. Will use train_error for early stopping.
Will train until train_error hasn't improved in 100 rounds.
如何将验证错误指定为停止指标?
我不使用 xgboost 的 R 绑定,R 包文档也没有具体说明。但是,python-API documentation(参见early_stopping_rounds
参数文档)对这个问题有相关的说明:
Requires at least one item in evals
. If there’s more than one, will use the last.
此处,evals
是将评估指标的样本列表,即它类似于您的 watchlist
参数。所以我猜,可能是您只需要交换作为该参数提供的列表中项目的顺序
感谢@abhiieor 提供的解决方案。从我观察到的情况来看,当我们仅使用监视列表中的验证时:
xgb_MOD5 <- xgb.train (params = params5, data = dtrain, nrounds = 400,watchlist = list(validation = dvalid),
print_every_n =30,early_stopping_rounds = 100, maximize = F ,serialize = TRUE)
运行时记录结果:
[1] validation-error:0.222037
Will train until validation_error hasn't improved in 100 rounds.
[31] validation-error:0.201712
[61] validation-error:0.201635
如果我们想在运行时同时看到训练错误和验证错误,
在观察列表中添加验证作为第二个参数,同时使用验证错误作为停止指标 。
xgb_MOD5 <- xgb.train (params = params5, data = dtrain, nrounds = 400,watchlist = list(train =dtrain,validation = dvalid),
print_every_n =30,early_stopping_rounds = 100, maximize = F ,serialize = TRUE)
[1] train-error:0.202131 validation-error:0.232341
Multiple eval metrics are present. Will use validation_error for early stopping.
Will train until validation_error hasn't improved in 100 rounds.
[31] train-error:0.174278 validation-error:0.202871
[61] train-error:0.173909 validation-error:0.202288
我在 xgboost 二元分类模型上使用训练和验证数据集。
params5 <- list(booster = "gbtree", objective = "binary:logistic",
eta=0.0001, gamma=0.5, max_depth=15, min_child_weight=1, subsample=0.6,
colsample_bytree=0.4,seed =2222)
xgb_MOD5 <- xgb.train (params = params5, data = dtrain, nrounds = 4000,
watchlist = list(validation = dvalid,train = dtrain),
print_every_n =30,early_stopping_rounds = 100
maximize = F ,serialize = TRUE)
它会自动选择训练误差作为停止指标。这导致模型在过度拟合的同时继续训练。
Multiple eval metrics are present. Will use train_error for early stopping.
Will train until train_error hasn't improved in 100 rounds.
如何将验证错误指定为停止指标?
我不使用 xgboost 的 R 绑定,R 包文档也没有具体说明。但是,python-API documentation(参见early_stopping_rounds
参数文档)对这个问题有相关的说明:
Requires at least one item in
evals
. If there’s more than one, will use the last.
此处,evals
是将评估指标的样本列表,即它类似于您的 watchlist
参数。所以我猜,可能是您只需要交换作为该参数提供的列表中项目的顺序
感谢@abhiieor 提供的解决方案。从我观察到的情况来看,当我们仅使用监视列表中的验证时:
xgb_MOD5 <- xgb.train (params = params5, data = dtrain, nrounds = 400,watchlist = list(validation = dvalid),
print_every_n =30,early_stopping_rounds = 100, maximize = F ,serialize = TRUE)
运行时记录结果:
[1] validation-error:0.222037
Will train until validation_error hasn't improved in 100 rounds.
[31] validation-error:0.201712
[61] validation-error:0.201635
如果我们想在运行时同时看到训练错误和验证错误,
在观察列表中添加验证作为第二个参数,同时使用验证错误作为停止指标 。
xgb_MOD5 <- xgb.train (params = params5, data = dtrain, nrounds = 400,watchlist = list(train =dtrain,validation = dvalid),
print_every_n =30,early_stopping_rounds = 100, maximize = F ,serialize = TRUE)
[1] train-error:0.202131 validation-error:0.232341
Multiple eval metrics are present. Will use validation_error for early stopping.
Will train until validation_error hasn't improved in 100 rounds.
[31] train-error:0.174278 validation-error:0.202871
[61] train-error:0.173909 validation-error:0.202288