是否可以将加载的 h2o 网格用于堆叠的合奏?
Is it possible to use loaded h2o grids for stacked ensembles?
我目前正在使用 R 和 h2o
库使用不同的机器学习方法处理多个数据集。因为我对每个数据集都有几个 10 折交叉验证,所以我为每个数据集 运行 一个 运行dom GridSearch 并使用 h2o.saveGrid
保存网格。当我再次加载这些网格以使用 h2o.stackedEnsemble
构建合奏时,它 returns 错误消息
Error: water.exceptions.H2OIllegalArgumentException: Failed to find the xval predictions frame id. Looks like keep_cross_validation_predictions wasn't set when building the models.
但是,keep_cross_validation_predictions
设置为 true,如果我使用网格而不保存和加载它,它运行得很好。所以我猜想在加载和保存的过程中丢失了一些东西。
有没有人知道是否有办法在 h2o
中使用加载的网格来堆叠合奏,或者它是否还不被支持?我很感激任何见解,因为这会节省我很多时间。我不能一直把它们都放在我的 h2o 集群中
我正在使用 R 3.6.3 和 h2o 3.32.0.1
一个最小的工作示例确实为我重现了错误:
library(h2o)
h2o.init()
train_data <- data.frame(y = rnorm(100,1,2),
x1 = rnorm(100,5,5),
x2 = rnorm(100,4,4),
x3 = rnorm(100,3,3),
x4 = rnorm(100,2,2))
params <- list(max_depth = seq(1, 6, 1),
sample_rate = seq(0.2, 1.0, 0.1))
search_criteria <- list(strategy = "RandomDiscrete", max_models = 10, seed = 2102)
train_h2o <- as.h2o(train_data,destination_frame = "Train")
gbm_grid <- h2o.grid("gbm",y = "y", x = c("x1","x2","x3","x4"), training_frame = train_h2o,
grid_id = "gbm_1", nfolds = 10, ntrees = 50, seed= 1111,
keep_cross_validation_predictions = TRUE,
hyper_params = params,
search_criteria = search_criteria)
h2o.performance(test_ens)
test_ens <- h2o.stackedEnsemble(y = "y", x = c("x1","x2","x3","x4"), training_frame = train_h2o,
metalearner_algorithm = "glm", model_id = "Ens1",
base_models = gbm_grid@model_ids[1:10])
h2o.saveGrid(grid_directory = paste0(getwd(),"/Data"),grid_id = "gbm_1")
加载网格时,训练集成会产生错误
h2o.removeAll()
train_h2o <- as.h2o(train_data,destination_frame = "Train")
gbm_grid <- h2o.loadGrid(paste0(getwd(),"/Data/gbm_1"))
test_ens <- h2o.stackedEnsemble(y = "y", x = c("x1","x2","x3","x4"), training_frame = train_h2o,
metalearner_algorithm = "glm", model_id = "Ens2",
base_models = gbm_grid@model_ids[1:10])
我也试过在 h2o.grid
中设置 export_checkpoints_dir
并手动加载所有模型(包括它们自动生成的 cv 折叠,与 h2o.saveGrid
相反,也以这种方式保存) 但它不会改变任何东西。
干杯
我目前正在使用 R 和 h2o
库使用不同的机器学习方法处理多个数据集。因为我对每个数据集都有几个 10 折交叉验证,所以我为每个数据集 运行 一个 运行dom GridSearch 并使用 h2o.saveGrid
保存网格。当我再次加载这些网格以使用 h2o.stackedEnsemble
构建合奏时,它 returns 错误消息
Error: water.exceptions.H2OIllegalArgumentException: Failed to find the xval predictions frame id. Looks like keep_cross_validation_predictions wasn't set when building the models.
但是,keep_cross_validation_predictions
设置为 true,如果我使用网格而不保存和加载它,它运行得很好。所以我猜想在加载和保存的过程中丢失了一些东西。
有没有人知道是否有办法在 h2o
中使用加载的网格来堆叠合奏,或者它是否还不被支持?我很感激任何见解,因为这会节省我很多时间。我不能一直把它们都放在我的 h2o 集群中
我正在使用 R 3.6.3 和 h2o 3.32.0.1
一个最小的工作示例确实为我重现了错误:
library(h2o)
h2o.init()
train_data <- data.frame(y = rnorm(100,1,2),
x1 = rnorm(100,5,5),
x2 = rnorm(100,4,4),
x3 = rnorm(100,3,3),
x4 = rnorm(100,2,2))
params <- list(max_depth = seq(1, 6, 1),
sample_rate = seq(0.2, 1.0, 0.1))
search_criteria <- list(strategy = "RandomDiscrete", max_models = 10, seed = 2102)
train_h2o <- as.h2o(train_data,destination_frame = "Train")
gbm_grid <- h2o.grid("gbm",y = "y", x = c("x1","x2","x3","x4"), training_frame = train_h2o,
grid_id = "gbm_1", nfolds = 10, ntrees = 50, seed= 1111,
keep_cross_validation_predictions = TRUE,
hyper_params = params,
search_criteria = search_criteria)
h2o.performance(test_ens)
test_ens <- h2o.stackedEnsemble(y = "y", x = c("x1","x2","x3","x4"), training_frame = train_h2o,
metalearner_algorithm = "glm", model_id = "Ens1",
base_models = gbm_grid@model_ids[1:10])
h2o.saveGrid(grid_directory = paste0(getwd(),"/Data"),grid_id = "gbm_1")
加载网格时,训练集成会产生错误
h2o.removeAll()
train_h2o <- as.h2o(train_data,destination_frame = "Train")
gbm_grid <- h2o.loadGrid(paste0(getwd(),"/Data/gbm_1"))
test_ens <- h2o.stackedEnsemble(y = "y", x = c("x1","x2","x3","x4"), training_frame = train_h2o,
metalearner_algorithm = "glm", model_id = "Ens2",
base_models = gbm_grid@model_ids[1:10])
我也试过在 h2o.grid
中设置 export_checkpoints_dir
并手动加载所有模型(包括它们自动生成的 cv 折叠,与 h2o.saveGrid
相反,也以这种方式保存) 但它不会改变任何东西。
干杯