关于 h2o.grid() 函数中并行性的问题

Question

我尝试使用 h2o 包中的 h2o.grid() 函数使用 R 进行一些调整，当我将参数 parallelism 设置为大于 1 时，它总是显示警告

Some models were not built due to a failure, for more details run `summary(grid_object, show_stack_traces = TRUE)

而最终的网格对象中的model_ids包含了很多以_cv_1、_cv_2等结尾的模型，模型的个数不等于我设置的max_models在search_criteria列表中，我认为它们只是cv过程中的模型，而不是最终模型。

当我设置 parallelism 大于 1 时：

当我保留parallelism默认值或将其设置为1时，结果是正常的，所有模型都以_model_1、_model_2等结尾

当我保留“并行度”默认值或将其设置为 1 时：

这是我的代码：

# set the grid
rf_h2o_grid <- list(mtries = seq(3, ncol(train_h2o), 4),
                    max_depth = c(5, 10, 15, 20))

# set the search_criteria
sc <- list(strategy = "RandomDiscrete", 
           seed = 100,
           max_models = 5
           )

# random grid tuning
rf_h2o_grid_tune_random <- h2o.grid(
  algorithm = "randomForest", 
  x = x, 
  y = y,
  training_frame = train_h2o,
  nfolds = 5,                     # use cv to validate the parameters
  fold_assignment = "Stratified",   
  ntrees = 100,
  seed = 100,
  hyper_params = rf_h2o_grid,
  search_criteria = sc
  # parallelism = 6           # when I set it larger than 1, the result always includes some "cv_" models
  )

那么如何在h2o.grid()中正确使用parallelism呢？感谢您的帮助！

Answer 1

这是网格搜索中并行性的实际问题，之前已注意到但未正确报告。感谢您提出这个问题，我们会尽快修复它：如果您想跟踪进度，请参阅 https://h2oai.atlassian.net/browse/PUBDEV-7886。

在正确修复之前，您必须避免在网格中同时使用 CV 和并行度。

关于以下错误：

Some models were not built due to a failure, for more details run `summary(grid_object, show_stack_traces = TRUE)

如果错误可重现，您应该通过运行带有 verbose=True 的网格获得更多详细信息。将整个错误消息添加到上面的票证中也可能有所帮助。

Answer 2

这是因为你设置max_models = 5，你的网格只会制作5个模型然后停止。

设置提前停止标准的方法有以下三种：

"max_models"：创建的最大模型数
"max_runtime_secs"：最大运行时间（以秒为单位）
通过设置“stopping_rounds”、“stopping_metric”和“stopping_tolerance”

关于 h2o.grid() 函数中并行性的问题

A question about the parallelism in h2o.grid() function

parallel-processing

r

h2o

grid-search

h2o.ai