数据 $update_params(params = params) 中的错误:[LightGBM] [致命] 在构造数据集句柄后无法更改 max_bin

Error in data$update_params(params = params) : [LightGBM] [Fatal] Cannot change max_bin after constructed Dataset handle

我在 RStudio 上下载了 lightgbm 包并尝试 运行 用它创建一个模型。 该脚本基于 Retip.

函数是这样的:

> fit.lightgbm
function (training, testing) 
{
  train <- as.matrix(training)
  test <- as.matrix(testing)
  coltrain <- ncol(train)
  coltest <- ncol(test)
  dtrain <- lightgbm::lgb.Dataset(train[, 2:coltrain], label = train[, 
                                                                     1])
  lightgbm::lgb.Dataset.construct(dtrain)
  dtest <- lightgbm::lgb.Dataset.create.valid(dtrain, test[,2:coltest], label = test[, 1])
  valids <- list(test = dtest)
  params <- list(objective = "regression", metric = "rmse")
  modelcv <- lightgbm::lgb.cv(params, dtrain, nrounds = 5000, 
                              nfold = 10, valids, verbose = 1, early_stopping_rounds = 1000, 
                              record = TRUE, eval_freq = 1L, stratified = TRUE, max_depth = 4, 
                              max_leaf = 20, max_bin = 50)
  best.iter <- modelcv$best_iter
  params <- list(objective = "regression_l2", metric = "rmse")
  model <- lightgbm::lgb.train(params, dtrain, nrounds = best.iter, 
                               valids, verbose = 0, early_stopping_rounds = 1000, record = TRUE, 
                               eval_freq = 1L, max_depth = 4, max_leaf = 20, max_bin = 50)
  print(paste0("End training"))
  return(model)
}

然而,当我尝试 运行 Retip

中的函数时
lightgbm <- fit.lightgbm(training,testing)

存在致命错误:

Error in data$update_params(params = params) : 
  [LightGBM] [Fatal] Cannot change max_bin after constructed Dataset handle. 

只有把max_bin改成max_bin=255才不会报错

浏览文档:

What is the right way for hyper parameter tuning for LightGBM classification? #1339

[Python] max_bin weird behaviour #1053

任何 ideas\suggestions 应该做什么?

这是交叉 posted 到 https://github.com/microsoft/LightGBM/issues/4019 并已在那里得到回答。

LightGBM 中 Dataset 对象的构造处理了一些重要的预处理步骤(参见this prior answer) that happen before training, and none of the Dataset parameters构造后可以更改。

max_bin=50 传递到 lgb.Dataset() 而不是原始 post 代码中的 lgb.cv() / lgb.train() 将导致成功训练而不会出现此错误。