在 h2o 中集成 - 缺少模型

Ensembling in h2o - missing models

我正在使用 h2o 包从具有不同正则化参数(alpha、lambda)的 GLM 模型构建一个集成。当我尝试构建一个整体时,请遵循文档:

ensemble <- h2o.stackedEnsemble(x = predictors,
                                y = response,
                                training_frame = train,
                                model_id = "ensemble",
                                base_models = list(glm_grid@model_ids)) 

其中 glm_grid@model_ids 是来自网格搜索的模型,用于确定 GLM 的最佳 alphalambda 正则化参数。我收到以下错误:

When creating a StackedEnsemble you must specify one or more models; 24 were specified but none of those were found: [list("glm_grid_model_6", glm_grid_model_11, glm_grid_model_7, glm_grid_model_9, glm_grid_model_2, glm_grid_model_21, glm_grid_model_15, glm_grid_model_0"]

您知道问题出在哪里吗?

编辑:我尝试按照文档进行操作并使用了与该文档类似的代码:

gbm_grid <- h2o.grid(algorithm = "gbm",
                     grid_id = "gbm_grid_binomial",
                     x = x,
                     y = y,
                     training_frame = train,
                     ntrees = 10,
                     seed = 1,
                     nfolds = nfolds,
                     fold_assignment = "Modulo",
                     keep_cross_validation_predictions = TRUE,
                     hyper_params = hyper_params,
                     search_criteria = search_criteria)

# Train a stacked ensemble using the GBM grid
ensemble <- h2o.stackedEnsemble(x = x,
                                y = y,
                                training_frame = train,
                                model_id = "ensemble_gbm_grid_binomial",
                                base_models = gbm_grid@model_ids)

根据@Erin LeDell 的说法,我删除了额外的 list(),现在可以使用了。然而,我最终想做的是使用来自各种模型的网格,比如:

ensemble <- h2o.stackedEnsemble(x = x,
                                y = y,
                                training_frame = train,
                                model_id = "my_ensemble_binomial",
                                base_models = list(my_gbm, my_rf))

编辑 2:

使用以下方法解决:

model_list <- as.list(c(glm_grid_1@model_ids,
                        glm_grid_2@model_ids))


ensemble <- h2o.stackedEnsemble(x = predictors,
                                y = response,
                                training_frame = train,
                                model_id = "ensemble1231",
                                base_models = model_list)

你有一个额外的 list() 包裹在 glm_grid@model_ids 周围,你在这里不需要,这可能是错误的来源。 glm_grid@model_ids 对象已经是一个列表。改为这样做:

ensemble <- h2o.stackedEnsemble(x = predictors,
                                y = response,
                                training_frame = train,
                                model_id = "ensemble",
                                base_models = glm_grid@model_ids) 

有关详细信息,请参阅 R 示例 here