R 和 H2O 中的参数优化
Parameter optimization in R and H2O
我需要在 RH2o 上对 gbm 模型进行参数优化。我对 H2o 比较陌生,我想我需要在执行以下操作之前将 ntrees 和 learn_rate(below) 转换为 H2o 向量。
我如何执行此操作?
谢谢!
ntrees <- c(100,200,300,400)
learn_rate <- c(1,0.5,0.1)
for (i in ntrees){
for j in learn_rate{
n = ntrees[i]
l= learn_rate[j]
gbm_model <- h2o.gbm(features, label, training_frame = train, validation_frame = valid, ntrees=ntrees[[i]],max_depth = 5,learn_rate=learn_rate[j])
print(c(ntrees[i],learn_rate[j],h2o.mse(h2o.performance(gbm_model, valid = TRUE))))
}
}
您可以使用 h2o.grid()
进行网格搜索
# specify your hyper parameters
hyper_params = list( ntrees = c(100,200,300,400), learn_rate = c(1,0.5,0.1) )
# then build your grid
grid <- h2o.grid(
## hyper parameters
hyper_params = hyper_params,
## which algorithm to run
algorithm = "gbm",
## identifier for the grid, to later retrieve it
grid_id = "my_grid",
## standard model parameters
x = features,
y = label,
training_frame = train,
validation_frame = valid,
## set a seed for reproducibility
seed = 1234)
您可以在 R 文档中阅读有关 h2o.grid() 工作原理的更多信息 http://docs.h2o.ai/h2o/latest-stable/h2o-r/h2o_package.pdf
,使用网格,这里最好。我将快速指出,您所写的是一种可用的方法,当网格无法满足您的需要时,您可以依靠这种方法。
您的示例不包含任何数据(参见 https://whosebug.com/help/mcve)所以我无法 运行,但我更正了我注意到的几个语法问题(R 的 for-in 循环直接给你值,而不是索引,以及第二个 for 循环周围的括号):
ntrees <- c(100,200,300,400)
learn_rate <- c(1,0.5,0.1)
for (n in ntrees){
for (l in learn_rate){
gbm_model <- h2o.gbm(
features, label, training_frame = train, validation_frame = valid,
ntrees = n,max_depth = 5,learn_rate = l
)
print(c(n,l,h2o.mse(h2o.performance(gbm_model, valid = TRUE))))
}
}
使用嵌套循环的一个例子是,当您想跳过某些组合时。例如。您可能决定只测试学习率为 0.1 的 100 个树,然后看起来像这样:
ntrees <- c(100,200,300,400)
learn_rate <- c(1,0.5,0.1)
for (n in ntrees){
for (l in learn_rate){
if(l == 0.1 && n > 100)next #Skip when n is 200,300,400
gbm_model <- h2o.gbm(
features, label, training_frame = train, validation_frame = valid,
ntrees = n,max_depth = 5,learn_rate = l
)
print(c(n,l,h2o.mse(h2o.performance(gbm_model, valid = TRUE))))
}
}
我需要在 RH2o 上对 gbm 模型进行参数优化。我对 H2o 比较陌生,我想我需要在执行以下操作之前将 ntrees 和 learn_rate(below) 转换为 H2o 向量。 我如何执行此操作? 谢谢!
ntrees <- c(100,200,300,400)
learn_rate <- c(1,0.5,0.1)
for (i in ntrees){
for j in learn_rate{
n = ntrees[i]
l= learn_rate[j]
gbm_model <- h2o.gbm(features, label, training_frame = train, validation_frame = valid, ntrees=ntrees[[i]],max_depth = 5,learn_rate=learn_rate[j])
print(c(ntrees[i],learn_rate[j],h2o.mse(h2o.performance(gbm_model, valid = TRUE))))
}
}
您可以使用 h2o.grid()
进行网格搜索
# specify your hyper parameters
hyper_params = list( ntrees = c(100,200,300,400), learn_rate = c(1,0.5,0.1) )
# then build your grid
grid <- h2o.grid(
## hyper parameters
hyper_params = hyper_params,
## which algorithm to run
algorithm = "gbm",
## identifier for the grid, to later retrieve it
grid_id = "my_grid",
## standard model parameters
x = features,
y = label,
training_frame = train,
validation_frame = valid,
## set a seed for reproducibility
seed = 1234)
您可以在 R 文档中阅读有关 h2o.grid() 工作原理的更多信息 http://docs.h2o.ai/h2o/latest-stable/h2o-r/h2o_package.pdf
您的示例不包含任何数据(参见 https://whosebug.com/help/mcve)所以我无法 运行,但我更正了我注意到的几个语法问题(R 的 for-in 循环直接给你值,而不是索引,以及第二个 for 循环周围的括号):
ntrees <- c(100,200,300,400)
learn_rate <- c(1,0.5,0.1)
for (n in ntrees){
for (l in learn_rate){
gbm_model <- h2o.gbm(
features, label, training_frame = train, validation_frame = valid,
ntrees = n,max_depth = 5,learn_rate = l
)
print(c(n,l,h2o.mse(h2o.performance(gbm_model, valid = TRUE))))
}
}
使用嵌套循环的一个例子是,当您想跳过某些组合时。例如。您可能决定只测试学习率为 0.1 的 100 个树,然后看起来像这样:
ntrees <- c(100,200,300,400)
learn_rate <- c(1,0.5,0.1)
for (n in ntrees){
for (l in learn_rate){
if(l == 0.1 && n > 100)next #Skip when n is 200,300,400
gbm_model <- h2o.gbm(
features, label, training_frame = train, validation_frame = valid,
ntrees = n,max_depth = 5,learn_rate = l
)
print(c(n,l,h2o.mse(h2o.performance(gbm_model, valid = TRUE))))
}
}