关于验证数据的 XGBoost 模型性能报告
XGBoost model performance reporting on validation data
我想利用 XGBoost 的 early.stop.round
功能进行非过拟合训练。为此,我使用以下代码:
param2 <- list("objective" = "reg:linear",
"eval_metric" = "rmse",
"max_depth" = 15,
"eta" = 0.03,
"gamma" = 0,
"subsample" = 0.5,
"colsample_bytree" = 0.6,
"min_child_weight" = 5,
"alpha" = 0.15)
watchlist <- list(train = xgb.DMatrix(data = train_matrix, label = output_train),
test = xgb.DMatrix(data = total_matrix[ind, ], label = as.matrix(output_total[ind, ])))
bst <- xgboost(data=train_matrix, label=output_train, nrounds = 500, watchlist = watchlist,
early.stop.round=5,verbose = 2, param=param2, missing = NaN)
因此,根据需要,我为监视列表创建了 train
和 test
xgb.DMatrix
,并将其传递给 xgboost()
。我确保 verbose
在那里打印中间结果。但是 verbose=2
我得到的日志如下:
tree prunning end, 1 roots, 1692 extra nodes, 0 pruned nodes ,max_depth=15
[74] train-rmse:0.129515
tree prunning end, 1 roots, 1874 extra nodes, 0 pruned nodes ,max_depth=15
[75] train-rmse:0.128455
tree prunning end, 1 roots, 1826 extra nodes, 0 pruned nodes ,max_depth=15
[76] train-rmse:0.127804
tree prunning end, 1 roots, 1462 extra nodes, 0 pruned nodes ,max_depth=15
[77] train-rmse:0.126874
tree prunning end, 1 roots, 1848 extra nodes, 0 pruned nodes ,max_depth=15
[78] train-rmse:0.125914
while with verbose=1
给我:
[74] train-rmse:0.129515
[75] train-rmse:0.128455
[76] train-rmse:0.127804
[77] train-rmse:0.126874
[78] train-rmse:0.125914
但是 none 这让我在测试 DMatrix 的每个步骤都有模型性能。我也试过没有成功:
verbose=T
和 verbose=F
.
- 将
test
DMatrix 的名称更改为 validation
我缺少什么以获得所需的输出。
显然,测试数据集性能报告只能使用 xgb.train()
而不是 xgboost()
来完成。相关修改代码(不复制上面的 param
部分)看起来像:
dtrain <- xgb.DMatrix(data = train_matrix, label = output_train)
dtest <- xgb.DMatrix(data = total_matrix[ind, ], label = as.matrix(output_total[ind, ]))
watchlist <- list(train = dtrain, test = dtest)
bst <- xgb.train(data= dtrain, nrounds = 500, watchlist = watchlist,
prediction = T, early.stop.round=5,verbose = 1, param=param2, missing = NaN)
我想利用 XGBoost 的 early.stop.round
功能进行非过拟合训练。为此,我使用以下代码:
param2 <- list("objective" = "reg:linear",
"eval_metric" = "rmse",
"max_depth" = 15,
"eta" = 0.03,
"gamma" = 0,
"subsample" = 0.5,
"colsample_bytree" = 0.6,
"min_child_weight" = 5,
"alpha" = 0.15)
watchlist <- list(train = xgb.DMatrix(data = train_matrix, label = output_train),
test = xgb.DMatrix(data = total_matrix[ind, ], label = as.matrix(output_total[ind, ])))
bst <- xgboost(data=train_matrix, label=output_train, nrounds = 500, watchlist = watchlist,
early.stop.round=5,verbose = 2, param=param2, missing = NaN)
因此,根据需要,我为监视列表创建了 train
和 test
xgb.DMatrix
,并将其传递给 xgboost()
。我确保 verbose
在那里打印中间结果。但是 verbose=2
我得到的日志如下:
tree prunning end, 1 roots, 1692 extra nodes, 0 pruned nodes ,max_depth=15
[74] train-rmse:0.129515
tree prunning end, 1 roots, 1874 extra nodes, 0 pruned nodes ,max_depth=15
[75] train-rmse:0.128455
tree prunning end, 1 roots, 1826 extra nodes, 0 pruned nodes ,max_depth=15
[76] train-rmse:0.127804
tree prunning end, 1 roots, 1462 extra nodes, 0 pruned nodes ,max_depth=15
[77] train-rmse:0.126874
tree prunning end, 1 roots, 1848 extra nodes, 0 pruned nodes ,max_depth=15
[78] train-rmse:0.125914
while with verbose=1
给我:
[74] train-rmse:0.129515
[75] train-rmse:0.128455
[76] train-rmse:0.127804
[77] train-rmse:0.126874
[78] train-rmse:0.125914
但是 none 这让我在测试 DMatrix 的每个步骤都有模型性能。我也试过没有成功:
verbose=T
和verbose=F
.- 将
test
DMatrix 的名称更改为validation
我缺少什么以获得所需的输出。
显然,测试数据集性能报告只能使用 xgb.train()
而不是 xgboost()
来完成。相关修改代码(不复制上面的 param
部分)看起来像:
dtrain <- xgb.DMatrix(data = train_matrix, label = output_train)
dtest <- xgb.DMatrix(data = total_matrix[ind, ], label = as.matrix(output_total[ind, ]))
watchlist <- list(train = dtrain, test = dtest)
bst <- xgb.train(data= dtrain, nrounds = 500, watchlist = watchlist,
prediction = T, early.stop.round=5,verbose = 1, param=param2, missing = NaN)