H2O AutoML - 如何提供权重
H2O AutoML - how to provide weights
这是我的示例,其中 Default
日期设置来自 ISLR 包。数据不平衡,所以我重新平衡它和 运行 H2O AutoML 仅使用 GBM。
library(ISLR)
library(h2o)
library(magrittr)
library(dplyr)
core_count <- detectCores()
h2o.init(nthreads = (core_count -1))
my_df <- Default
x <- setdiff(colnames(df_train), 'default')
y <- 'default'
my_df %<>% mutate(weights = if_else(default =='No',
0.6/table(my_df$default)[[1]],0.4/table(my_df$default)[[2]]))
aml_test <- h2o.automl(x = x, y = y,
training_frame = as.h2o(my_df[1:8000, ]),
validation_frame = as.h2o(my_df[8001:10000, ]),
nfolds = 0,
weights_column = "weights",
include_algos = c('GBM'),
seed = 12345,
max_runtime_secs = 1200)
它生成以下错误:
09:46:49.611: Skipping training of model GBM_1_AutoML_20210821_094649 due to exception:
water.exceptions.H2OModelBuilderIllegalArgumentException: Illegal argument(s) for GBM model:
GBM_1_AutoML_20210821_094649. Details: ERRR on field: _min_rows: The dataset size is too
small to split for min_rows=1.0: must have at least 2.0 (weighted) rows, but have only
0.7172904568994339.
09:46:49.622: Skipping training of model GBM_2_AutoML_20210821_094649 due to exception:
water.exceptions.H2OModelBuilderIllegalArgumentException: Illegal argument(s) for GBM model:
GBM_2_AutoML_20210821_094649. Details: ERRR on field: _min_rows: The dataset size is too
small to split for min_rows=10.0: must have at least 20.0 (weighted) rows, but have only
0.7172904568994339.
09:46:49.630: Skipping training of model GBM_3_AutoML_20210821_094649 due to exception:
water.exceptions.H2OModelBuilderIllegalArgumentException: Illegal argument(s) for GBM model:
GBM_3_AutoML_20210821_094649. Details: ERRR on field: _min_rows: The dataset size is too
small to split for min_rows=10.0: must have at least 20.0 (weighted) rows, but have only
0.7172904568994339.
09:46:49.637: Skipping training of model GBM_4_AutoML_20210821_094649 due to exception:
water.exceptions.H2OModelBuilderIllegalArgumentException: Illegal argument(s) for GBM model:
GBM_4_AutoML_20210821_094649. Details: ERRR on field: _min_rows: The dataset size is too
small to split for min_rows=10.0: must have at least 20.0 (weighted) rows, but have only
0.7172904568994339.
09:46:49.644: Skipping training of model GBM_5_AutoML_20210821_094649 due to exception:
water.exceptions.H2OModelBuilderIllegalArgumentException: Illegal argument(s) for GBM model:
GBM_5_AutoML_20210821_094649. Details: ERRR on field: _min_rows: The dataset size is too
small to split for min_rows=100.0: must have at least 200.0 (weighted) rows, but have only
0.7172904568994339.
|===================================================================================| 100%
09:49:50.241: Empty leaderboard.
AutoML was not able to build any model within a max runtime constraint of 1200 seconds,
you may want to increase this value before retrying.The leaderboard contains zero models:
try running AutoML for longer (the default is 1 hour).
基本上,只要提供 类 的权重,它就不适用于 GBM。它在没有重量的情况下工作正常。它甚至没有 运行 整整 20 分钟。没有生成模型。
您的输出中出现错误消息
Details: ERRR on field: _min_rows: The dataset size is too
small to split for min_rows=10.0: must have at least 20.0 (weighted) rows, but have only
0.7xxxx.
您似乎需要增加权重值 and/or 增加行数。尝试将体重列乘以 10 或 100 倍,看看是否有帮助。我怀疑如果您尝试将权重列设置为所有列,这不会成为问题。
这是我的示例,其中 Default
日期设置来自 ISLR 包。数据不平衡,所以我重新平衡它和 运行 H2O AutoML 仅使用 GBM。
library(ISLR)
library(h2o)
library(magrittr)
library(dplyr)
core_count <- detectCores()
h2o.init(nthreads = (core_count -1))
my_df <- Default
x <- setdiff(colnames(df_train), 'default')
y <- 'default'
my_df %<>% mutate(weights = if_else(default =='No',
0.6/table(my_df$default)[[1]],0.4/table(my_df$default)[[2]]))
aml_test <- h2o.automl(x = x, y = y,
training_frame = as.h2o(my_df[1:8000, ]),
validation_frame = as.h2o(my_df[8001:10000, ]),
nfolds = 0,
weights_column = "weights",
include_algos = c('GBM'),
seed = 12345,
max_runtime_secs = 1200)
它生成以下错误:
09:46:49.611: Skipping training of model GBM_1_AutoML_20210821_094649 due to exception:
water.exceptions.H2OModelBuilderIllegalArgumentException: Illegal argument(s) for GBM model:
GBM_1_AutoML_20210821_094649. Details: ERRR on field: _min_rows: The dataset size is too
small to split for min_rows=1.0: must have at least 2.0 (weighted) rows, but have only
0.7172904568994339.
09:46:49.622: Skipping training of model GBM_2_AutoML_20210821_094649 due to exception:
water.exceptions.H2OModelBuilderIllegalArgumentException: Illegal argument(s) for GBM model:
GBM_2_AutoML_20210821_094649. Details: ERRR on field: _min_rows: The dataset size is too
small to split for min_rows=10.0: must have at least 20.0 (weighted) rows, but have only
0.7172904568994339.
09:46:49.630: Skipping training of model GBM_3_AutoML_20210821_094649 due to exception:
water.exceptions.H2OModelBuilderIllegalArgumentException: Illegal argument(s) for GBM model:
GBM_3_AutoML_20210821_094649. Details: ERRR on field: _min_rows: The dataset size is too
small to split for min_rows=10.0: must have at least 20.0 (weighted) rows, but have only
0.7172904568994339.
09:46:49.637: Skipping training of model GBM_4_AutoML_20210821_094649 due to exception:
water.exceptions.H2OModelBuilderIllegalArgumentException: Illegal argument(s) for GBM model:
GBM_4_AutoML_20210821_094649. Details: ERRR on field: _min_rows: The dataset size is too
small to split for min_rows=10.0: must have at least 20.0 (weighted) rows, but have only
0.7172904568994339.
09:46:49.644: Skipping training of model GBM_5_AutoML_20210821_094649 due to exception:
water.exceptions.H2OModelBuilderIllegalArgumentException: Illegal argument(s) for GBM model:
GBM_5_AutoML_20210821_094649. Details: ERRR on field: _min_rows: The dataset size is too
small to split for min_rows=100.0: must have at least 200.0 (weighted) rows, but have only
0.7172904568994339.
|===================================================================================| 100%
09:49:50.241: Empty leaderboard.
AutoML was not able to build any model within a max runtime constraint of 1200 seconds,
you may want to increase this value before retrying.The leaderboard contains zero models:
try running AutoML for longer (the default is 1 hour).
基本上,只要提供 类 的权重,它就不适用于 GBM。它在没有重量的情况下工作正常。它甚至没有 运行 整整 20 分钟。没有生成模型。
您的输出中出现错误消息
Details: ERRR on field: _min_rows: The dataset size is too
small to split for min_rows=10.0: must have at least 20.0 (weighted) rows, but have only
0.7xxxx.
您似乎需要增加权重值 and/or 增加行数。尝试将体重列乘以 10 或 100 倍,看看是否有帮助。我怀疑如果您尝试将权重列设置为所有列,这不会成为问题。