H2O AutoML 不断收到意外的 HTTP 错误

H2O AutoML keeps getting Unexpected HTTP error

我已经在一个数据集中使用完全相同的代码尝试过 h2o,现在尝试使用另一个数据集。 但我一直收到 'Unexpected HTTP error'

代码示例如下:

import h2o
h2o.init()
train_data = h2o.import_file("pathtofile.csv")
x = train_data.columns
y = "Class"
x.remove(y)
train_data[y] = train_data[y].asfactor()
from h2o.automl import H2OAutoML
aml = H2OAutoML(max_models=10, seed=1,  max_runtime_secs=57600)
aml.train(x=x, y=y, training_frame=train_data)

此时的错误是:

---------------------------------------------------------------------------
H2OConnectionError                        Traceback (most recent call last)
<ipython-input-14-435d6f31b64e> in <module>()
      1 from h2o.automl import H2OAutoML
      2 aml = H2OAutoML(max_models=10, seed=1,  max_runtime_secs=57600)
----> 3 aml.train(x=x, y=y, training_frame=train_data)

/opt/anaconda3/envs/ege/lib/python2.7/site-packages/h2o/automl/autoh2o.pyc in train(self, x, y, training_frame, fold_column, weights_column, validation_frame, leaderboard_frame, blending_frame)
    443         poll_updates = ft.partial(self._poll_training_updates, verbosity=self._verbosity, state={})
    444         try:
--> 445             self._job.poll(poll_updates=poll_updates)
    446         finally:
    447             poll_updates(self._job, 1)

/opt/anaconda3/envs/ege/lib/python2.7/site-packages/h2o/job.pyc in poll(self, poll_updates)
     55             pb = ProgressBar(title=self._job_type + " progress", hidden=hidden)
     56             if poll_updates:
---> 57                 pb.execute(self._refresh_job_status, print_verbose_info=ft.partial(poll_updates, self))
     58             else:
     59                 pb.execute(self._refresh_job_status)

/opt/anaconda3/envs/ege/lib/python2.7/site-packages/h2o/utils/progressbar.pyc in execute(self, progress_fn, print_verbose_info)
    169                 # Query the progress level, but only if it's time already
    170                 if self._next_poll_time <= now:
--> 171                     res = progress_fn()  # may raise StopIteration
    172                     assert_is_type(res, (numeric, numeric), numeric)
    173                     if not isinstance(res, tuple):

/opt/anaconda3/envs/ege/lib/python2.7/site-packages/h2o/job.pyc in _refresh_job_status(self)
     92     def _refresh_job_status(self):
     93         if self._poll_count <= 0: raise StopIteration("")
---> 94         jobs = h2o.api("GET /3/Jobs/%s" % self.job_key)
     95         self.job = jobs["jobs"][0] if "jobs" in jobs else jobs["job"][0]
     96         self.status = self.job["status"]

/opt/anaconda3/envs/ege/lib/python2.7/site-packages/h2o/h2o.pyc in api(endpoint, data, json, filename, save_to)
    102     # type checks are performed in H2OConnection class
    103     _check_connection()
--> 104     return h2oconn.request(endpoint, data=data, json=json, filename=filename, save_to=save_to)
    105 
    106 

/opt/anaconda3/envs/ege/lib/python2.7/site-packages/h2o/backend/connection.pyc in request(self, endpoint, data, json, filename, save_to)
    439             else:
    440                 self._log_end_exception(e)
--> 441                 raise H2OConnectionError("Unexpected HTTP error: %s" % e)
    442         except requests.exceptions.Timeout as e:
    443             self._log_end_exception(e)

H2OConnectionError: Unexpected HTTP error: ('Connection aborted.', error(104, 'Connection reset by peer'))

我已尝试 h2o.cluster().shutdown() 并终止进程,但我一直收到上述错误。

事实证明,在数据集中,一列包含具有非 UTF-8 字符的名称,例如“Ö”、“Ş”等。因此删除此列后,它又开始工作了。在我看来,这应该在以后的版本中由 H2O 修复。