在 R 中使用 h2o.glm 时出错

Error using h2o.glm in R

我是 R 中 h2o 实现的新手。我有这样一个数据框 (df1):

df<-structure(list(v1 = c(5.24823, 0.839, 3.57348, 1.47869, 2.75093, 
1.69665, 0.46366, 1.53827, 2.0149, 2.32103, 1.87223, 2.3392, 
2.10579, 1.7236, 1.13056, 1.09144, 3.52515, 1.16248, 1.77885, 
0.9991, 0.47375, 2.91148, 1.237, 1.18971, 1.23953, 1.07049, 1.46971, 
1.65649, 3.3021, 1.04816), v100 = c(19.60784, 9.27047, 0.5523, 
15.05735, 0.93231, 11.73979, 19.53795, 6.22754, 4.54464, 17.0922, 
3.60958, 18.23052, 0.06395, 17.17605, 5.52724, 17.85276, 15.57143, 
0.05825, 19.85401, 14.51163, 6.64372, 19.60284, 16.40279, 16.89205, 
19.6748, 14.64446, 19.34747, 9.04215, 11.37993, 16.81159), v101 = c(10.71683, 
7.13707, 3.61956, 9.75558, 4.21413, 8.49785, 6.79572, 5.19486, 
7.39523, 6.05496, 2.91676, 9.82552, 5.5107, 5.40719, 10.82138, 
12.37154, 5.56351, 3.8549, 9.87455, 5.37746, 3.57747, 8.11406, 
6.61883, 7.3667, 7.74248, 12.44785, 12.38174, 5.99648, 7.10452, 
8.27756)), .Names = c("v1", "v100", "v101"), row.names = c(85671L, 
92268L, 44249L, 68218L, 3250L, 105583L, 4874L, 94393L, 83502L, 
61414L, 42987L, 50200L, 80887L, 9321L, 39565L, 79644L, 26265L, 
75272L, 104819L, 72782L, 57101L, 59037L, 78810L, 88619L, 21564L, 
39198L, 55030L, 44193L, 6116L, 101448L), class = "data.frame")

我想使用 h2o 包制作 glm。所以我有以下代码:

  library(h2o)
  library(h2oEnsemble)

  modellm<-h2o.glm(y="v1", x="v100",training_frame=df ,family="gaussian",
                   nfolds = 0, alpha = 0.1, lambda_search = FALSE)

但是,执行代码后出现以下错误:

Error in value[[3L]](cond) : 
  argument "training_frame" must be a valid H2OFrame or ID

我尝试了以下主题:

但是,并没有解决我的问题。在上面 link:

执行推荐的解决方案后,我得到以下结果
> library(devtools)
> install_github("h2oai/h2o-3/h2o-r/ensemble/h2oEnsemble-package")
Downloading github repo h2oai/h2o-3@master
Installing h2oEnsemble
"C:/PROGRA~1/R/R-32~1.4R~/bin/x64/R" --no-site-file --no-environ  \
  --no-save --no-restore CMD INSTALL  \
  "C:/Users/ozgur/AppData/Local/Temp/RtmpAfGU5K/devtools8f064866e23/h2oai-h2o-3-30ef929/h2o-r/ensemble/h2oEnsemble-package"  \
  --library="C:/Users/ozgur/Documents/R/win-library/3.2"  \
  --install-tests 

* installing *source* package 'h2oEnsemble' ...
** R
** tests
** preparing package for lazy loading
Warning: package 'h2o' was built under R version 3.2.5
Warning: package 'statmod' was built under R version 3.2.5
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
*** arch - i386
Warning: package 'h2o' was built under R version 3.2.5
Warning: package 'statmod' was built under R version 3.2.5
*** arch - x64
Warning: package 'h2o' was built under R version 3.2.5
Warning: package 'statmod' was built under R version 3.2.5
* DONE (h2oEnsemble)
Reloading installed h2oEnsemble
h2oEnsemble (beta) for H2O >=3.0
Version: 0.1.8
Package created on 2016-03-29  

如果有任何帮助,我将非常高兴。非常感谢。

如果您只是想训练 H2O GLM,则不需要 h2oEnsemble 包,因此您可以从代码中删除 library(h2oEnsemble)。在 library(h2o) 之后,您还必须将以下行添加到您的代码中,h2o.init(nthreads = -1),这将在后台启动一个 H2O 集群——"H2O cluster" 是优化的 Java 代码并行执行。

您遇到的问题与您的 training_frame 有关。在 H2O 中,training_frame 参数必须是 "H2OFrame",而不是典型的 R data.frame。出于可扩展性原因,H2O 使用称为 "H2OFrames" 的分布式数据帧,而不是标准的内存 data.frame 对象。

要将 df 转换为 H2OFrame 并训练 GLM,请执行以下操作:

hdf <- as.h2o(df)  #convert data.frame to H2OFrame
modellm <- h2o.glm(y = "v1", x = "v100",training_frame = hdf, family = "gaussian",
               nfolds = 0, alpha = 0.1, lambda_search = FALSE)

或者,如果您的数据在 CSV 文件中,例如,您可以使用 h2o.importFile() 函数直接将数据导入 H2O 集群,然后您不需要转换它从 R data.frame 到 H2OFrame.

由于您是 H2O 的新手,我建议您查看我创建的这个 Jupyter R notebook 来教人们如何使用 H2O。欢迎来到 H2O!