h2o gridsearch "TypeError: unsupported operand" error

Question

看到错误

TypeError: unsupported operand type(s) for +: 'NoneType' and 'unicode'

尝试使用 gridsearch 在 h2o 中训练模型时无法解释原因。

这是错误之前打印的输出：

drf Grid Build progress: |████████████████████████████████████████████████| 100%
Errors/Warnings building gridsearch model

Hyper-parameter: col_sample_rate_per_tree, 0.75
Hyper-parameter: max_depth, 5
Hyper-parameter: min_rows, 4096.0
Hyper-parameter: min_split_improvement, 1e-08
Hyper-parameter: mtries, 8
Hyper-parameter: nbins, 8
Hyper-parameter: nbins_cats, 64
Hyper-parameter: ntrees, 96
Hyper-parameter: sample_rate, 0.6320000291
failure_details: None
failure_stack_traces: java.lang.NullPointerException
    at hex.tree.SharedTree.init(SharedTree.java:164)
    at hex.tree.drf.DRF.init(DRF.java:53)
    at hex.tree.SharedTree$Driver.computeImpl(SharedTree.java:207)
    at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:222)
    at hex.ModelBuilder.trainModelNested(ModelBuilder.java:348)
    at hex.ModelBuilder$TrainModelNestedRunnable.run(ModelBuilder.java:383)
    at water.H2O.runOnH2ONode(H2O.java:1304)
    at water.H2O.runOnH2ONode(H2O.java:1297)
    at hex.ModelBuilder.trainModelNested(ModelBuilder.java:364)
    at hex.grid.GridSearch.buildModel(GridSearch.java:343)
    at hex.grid.GridSearch.gridSearch(GridSearch.java:220)
    at hex.grid.GridSearch.access[=10=]0(GridSearch.java:71)
    at hex.grid.GridSearch.compute2(GridSearch.java:138)
    at water.H2O$H2OCountedCompleter.compute(H2O.java:1416)
    at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
    at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
    at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
    at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
    at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

以及用于创建网格搜索对象的代码

model = h2o.h2o.H2ORandomForestEstimator(
                response_column=configs['RESPONSE'],
                keep_cross_validation_models=False,
                keep_cross_validation_predictions=False
            )

random_forest_grid = h2o.h2o.H2OGridSearch(model=model, 
                     hyper_params=configs['HYPERPARAMETER_RANGES'], 
                     search_criteria=configs['SEARCH_CRITERIA'])
.
.
.

max_train_time_hrs = 8
# here is where the ERROR is thrown
random_forest_grid.train(x=training_features, y=training_response,
                         weights_column='weight',
                         training_frame=train_u, validation_frame=test_u,
                         max_runtime_secs=max_train_time_hrs * 60 * 60)

这里提到的 configs 是一本字典，例如...

configs = {
.
.
.
 'HYPERPARAMETER_RANGES': {
        'ntrees': [32, 64, 96, 128],  # default 50
        'nbins_cats': [16, 32, 64, 128, 512, 1024],  # default is 1024
        'nbins': [8, 13, 21, 34],  # default is 20
        'max_depth': [5, 8, 13],  # default is 20
        'mtries': [-1, 5, 8, 13],  # default is -1 for the square root of number of features
        'min_split_improvement': [1 * 10 ** -8,
                                  1 * 10 ** -5,
                                  1 * 10 ** -3],
        'min_rows': [16, 64, 256, 1024, 4096],  # this option specifies the number of observations for a split
        'col_sample_rate_per_tree': [0.75, 0.9, 1],  # default is 1
        'sample_rate': [0.5, 0.6320000291, 0.75]  # default is 0.6320000291
},
'SEARCH_CRITERIA': {
    'strategy': 'RandomDiscrete',
    'max_models': 24,
    'seed': 64,
    'stopping_metric': 'AUTO',  # log-loss
 }

}

请注意，网格搜索适用于我正在训练的其他一些 DRF 模型（具有完全相同的网格搜索超参数和标准范围），并且似乎无法在这些工作版本和这个错误版本之间找到任何显着差异。在 h2o 中可能会抛出这种错误的任何常见原因？任何理论或进一步的调试建议将不胜感激。

Answer 1

通过检查 h2o 中的日志找到了错误的原因 Flow UI（我想说这是一个很好的 h2o 调试提示（因为看起来有些错误只在那里打印，而不是标准错误输出)).

06-20 12:39:02.188 172.18.4.64:54321     27694  FJ-1-11   INFO: Building H2O DRF model with these parameters:
06-20 12:39:02.188 172.18.4.64:54321     27694  FJ-1-11   INFO: {"_train":{"name":"py_9_sid_827e","type":"Key"},"_valid":{"name":"py_10_sid_827e","type":"Key"},"_nfolds":0,"_keep_cross_validation_models":false,"_keep_cross_validation_predictions":false,"_keep_cross_validation_fold_assignment":false,"_parallelize_cross_validation":true,"_auto_rebalance":true,"_seed":111,"_fold_assignment":"AUTO","_categorical_encoding":"AUTO","_max_categorical_levels":10,"_distribution":"AUTO","_tweedie_power":1.5,"_quantile_alpha":0.5,"_huber_alpha":0.9,"_ignored_columns":null,"_ignore_const_cols":true,"_weights_column":"weight","_offset_column":null,"_fold_column":null,"_check_constant_response":true,"_is_cv_model":false,"_score_each_iteration":false,"_max_runtime_secs":28800.0,"_stopping_rounds":0,"_stopping_metric":"AUTO","_stopping_tolerance":0.001,"_response_column":"DENIAL","_balance_classes":false,"_max_after_balance_size":5.0,"_class_sampling_factors":null,"_max_confusion_matrix_size":20,"_checkpoint":null,"_pretrained_autoencoder":null,"_custom_metric_func":null,"_export_checkpoints_dir":null,"_ntrees":96,"_max_depth":13,"_min_rows":64.0,"_nbins":13,"_nbins_cats":16,"_min_split_improvement":1.0E-5,"_histogram_type":"AUTO","_r2_stopping":1.7976931348623157E308,"_nbins_top_level":1024,"_build_tree_one_node":false,"_score_tree_interval":0,"_initial_score_interval":4000,"_score_interval":4000,"_sample_rate":0.6320000291,"_sample_rate_per_class":null,"_calibrate_model":false,"_calibration_frame":null,"_col_sample_rate_change_per_level":1.0,"_col_sample_rate_per_tree":1.0,"_binomial_double_trees":true,"_mtries":5}
06-20 12:39:02.189 172.18.4.64:54321     27694  FJ-1-11   ERRR: _weights_column: Weights column 'weight' not found in the training frame
06-20 12:39:02.189 172.18.4.64:54321     27694  FJ-1-11   ERRR: _weights_column: Weights column 'weight' not found in the training frame

事实证明，问题是由于在网格搜索中指定用作 weights_column 参数的列实际上并不存在于所使用的 H2OFrame 中。

将尝试减少问题 post 以便与其他可能仅根据标题发现此问题的人更相关（因为打印在控制台中的标准错误没有给出具体问题的指示）。