h2o.automl:排行榜中的 NaN 值

h2o.automl: NaN values in leaderboerd

我是 运行 h2o.automl() 示例来自:http://h2o-release.s3.amazonaws.com/h2o/master/3888/docs-website/h2o-docs/automl.html。除了 leaderboard 中的 NaN 值外,一切都很好。预测也能正常工作。是错误还是我做错了什么?

library(h2o)

localH2O <- h2o.init(ip = "localhost",
                 port = 54321, 
                 nthreads = -1, 
                 min_mem_size = "20g")

train <- h2o.importFile("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv")
test <- h2o.importFile("https://s3.amazonaws.com/erin-data/higgs/higgs_test_5k.csv")

y <- "response"
x <- setdiff(names(train), y)

train[,y] <- as.factor(train[,y])
test[,y] <- as.factor(test[,y])

aml <- h2o.automl(x = x, y = y,
              training_frame = train,
              leaderboard_frame = test,
              max_runtime_secs = 30)

lb <- aml@leaderboard
lb

                                   model_id auc logloss
1  StackedEnsemble_0_AutoML_20170908_094736 NaN     NaN
2  StackedEnsemble_0_AutoML_20170908_094407 NaN     NaN
3 GBM_grid_0_AutoML_20170908_094736_model_1 NaN     NaN
4 GBM_grid_0_AutoML_20170908_094407_model_0 NaN     NaN
5 GBM_grid_0_AutoML_20170908_094407_model_1 NaN     NaN
6 GBM_grid_0_AutoML_20170908_094736_model_0 NaN     NaN

我检查过 localhost:54321 上的 H2O Flow 有正常值,而且我使用 h2o.getFrame():

得到正常值
h2o.getFrame("leaderboard")
                                   model_id      auc  logloss
1  StackedEnsemble_0_AutoML_20170908_094736 0,787145 0,554983
2  StackedEnsemble_0_AutoML_20170908_094407 0,785154 0,556897
3 GBM_grid_0_AutoML_20170908_094736_model_1 0,778587 0,563741
4 GBM_grid_0_AutoML_20170908_094407_model_0 0,776755 0,564247
5 GBM_grid_0_AutoML_20170908_094407_model_1 0,776640 0,564436
6 GBM_grid_0_AutoML_20170908_094736_model_0 0,774611 0,566920

我使用的是 h2o v. 3.15.0.4018

h2o.clusterInfo()
R is connected to the H2O cluster: 
H2O cluster uptime:         2 hours 8 minutes 
H2O cluster version:        3.15.0.4018 
H2O cluster version age:    15 hours and 47 minutes  
H2O cluster name:           H2O_started_from_R_maju116_ozj558 
H2O cluster total nodes:    1 
H2O cluster total memory:   19.03 GB 
H2O cluster total cores:    8 
H2O cluster allowed cores:  8 
H2O cluster healthy:        TRUE 
H2O Connection ip:          localhost 
H2O Connection port:        54321 
H2O Connection proxy:       NA 
H2O Internal Security:      FALSE 
H2O API Extensions:         XGBoost, Algos, AutoML, Core V3, Core V4 
R Version:                  R version 3.4.1 (2017-06-30) 

Session 信息:

R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so

locale:
 [1] LC_CTYPE=pl_PL.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=pl_PL.UTF-8        LC_COLLATE=pl_PL.UTF-8    
 [5] LC_MONETARY=pl_PL.UTF-8    LC_MESSAGES=pl_PL.UTF-8   
 [7] LC_PAPER=pl_PL.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=pl_PL.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_0.7.2       purrr_0.2.3       readr_1.1.1       tidyr_0.7.1      

[5] tibble_1.3.4      ggplot2_2.2.1     tidyverse_1.1.1   h2oEnsemble_0.2.1
 [9] h2o_3.15.0.4018  

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.12     cellranger_1.1.0 compiler_3.4.1   plyr_1.8.4      
 [5] bindr_0.1        forcats_0.2.0    bitops_1.0-6     tools_3.4.1     

 [9] lubridate_1.6.0  jsonlite_1.5     nlme_3.1-131     gtable_0.2.0    

[13] lattice_0.20-35  pkgconfig_2.0.1  rlang_0.1.2      psych_1.7.5     

[17] parallel_3.4.1   haven_1.1.0      bindrcpp_0.2     xml2_1.1.1      

[21] httr_1.3.1       stringr_1.2.0    hms_0.3          grid_3.4.1      

[25] glue_1.1.1       R6_2.2.2         readxl_1.0.0     foreign_0.8-69  

[29] modelr_0.1.1     reshape2_1.4.2   magrittr_1.5     scales_0.5.0    

[33] rvest_0.3.2      assertthat_0.2.0 mnormt_1.5-5     colorspace_1.3-2
[37] stringi_1.1.5    lazyeval_0.2.0   munsell_0.4.3    RCurl_1.95-4.8  

[41] broom_0.4.2 

只是预感,但请在 en_US 语言环境中尝试 运行ning R。

如果这解决了问题,我想正在发生的事情是 aml@leaderboardh2o.getFrame("leaderboard") 被浮点数中的逗号阻塞,这就是 NaN 的来源。 IE。显示错误,而不是数据错误。

(如果这确实解决了问题,了解如果您 运行 在同一 pl_PL.UTF-8 语言环境中同时使用 H2O 和 R 会发生什么也可能很有用。)