通过 python API 在 MapR 6.0 上与 H2O.ai Hadoop 的兼容性问题?

Compatibility issues with H2O.ai Hadoop on MapR 6.0 via python API?

在 MapR 6.0 上存在明显的兼容性问题 运行 H2O(通过 3.18.0.2 MapR 5.2 driver (trying with the latest driver (3.20.0.7) as recommended in another SO 对问题没有帮助)。

虽然能够启动 MapR 6.0 上的 H2O 集群(通过 hadoop jar h2odriver.jar -nodes 3 -mapperXmx 6g -output hdfsOutputDirName 之类的东西)并且似乎能够访问 h2o Flow UI,通过 python API 访问集群时出现问题(pip show h2o 确认匹配包版本与正在使用的驱动程序)。

MapR 5.2 驱动(目前是 H2O 提供的最新 MapR 驱动版本)是否与 MapR 6.0 不兼容(不会问是否因为似乎可以使用 H2O Flow UI在 MapR 6.0 上启动的集群实例上)?除了独立驱动程序版本之外的任何解决方法(仍然希望能够在 hadoop 集群上利用 YARN)?

尝试使用 python API 连接到 运行 H2O 时看到的代码和错误如下所示。

# connect to h2o service
h2o.init(ip=h2o_cnxn_ip)

其中h2o_cnxn_ip是在MapR 6.0系统上启动h2o集群后生成的IP和端口。产生错误

Checking whether there is an H2O instance running at http://172.18.0.123:54321...
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-5-1728877a03a2> in <module>()
      1 # connect to h2o service
----> 2 h2o.init(ip=h2o_cnxn_ip)

/home/me/projects/myproject/lib/python2.7/site-packages/h2o/h2o.pyc in init(url, ip, port, https, insecure, username, password, cookies, proxy, start_h2o, nthreads, ice_root, enable_assertions, max_mem_size, min_mem_size, strict_version_check, ignore_config, extra_classpath, **kwargs)
    250                                      auth=auth, proxy=proxy,cookies=cookies, verbose=True,
    251                                      _msgs=("Checking whether there is an H2O instance running at {url}",
--> 252                                             "connected.", "not found."))
    253     except H2OConnectionError:
    254         # Backward compatibility: in init() port parameter really meant "baseport" when starting a local server...

/home/me/projects/myproject/lib/python2.7/site-packages/h2o/backend/connection.pyc in open(server, url, ip, port, https, auth, verify_ssl_certificates, proxy, cookies, verbose, _msgs)
    316             conn._stage = 1
    317             conn._timeout = 3.0
--> 318             conn._cluster = conn._test_connection(retries, messages=_msgs)
    319             # If a server is unable to respond within 1s, it should be considered a bug. However we disable this
    320             # setting for now, for no good reason other than to ignore all those bugs :(

/home/me/projects/myproject/lib/python2.7/site-packages/h2o/backend/connection.pyc in _test_connection(self, max_retries, messages)
    558                 raise H2OServerError("Local server was unable to start")
    559             try:
--> 560                 cld = self.request("GET /3/Cloud")
    561                 if cld.consensus and cld.cloud_healthy:
    562                     self._print(" " + messages[1])

/home/me/projects/myproject/lib/python2.7/site-packages/h2o/backend/connection.pyc in request(self, endpoint, data, json, filename, save_to)
    400                                     auth=self._auth, verify=self._verify_ssl_cert, proxies=self._proxies)
    401             self._log_end_transaction(start_time, resp)
--> 402             return self._process_response(resp, save_to)
    403 
    404         except (requests.exceptions.ConnectionError, requests.exceptions.HTTPError) as e:

/home/me/projects/myproject/lib/python2.7/site-packages/h2o/backend/connection.pyc in _process_response(response, save_to)
    711         if content_type == "application/json":
    712             try:
--> 713                 data = response.json(object_pairs_hook=H2OResponse)
    714             except (JSONDecodeError, requests.exceptions.ContentDecodingError) as e:
    715                 raise H2OServerError("Malformed JSON from server (%s):\n%s" % (str(e), response.text))

/home/me/projects/myproject/lib/python2.7/site-packages/requests/models.pyc in json(self, **kwargs)
    882                 try:
    883                     return complexjson.loads(
--> 884                         self.content.decode(encoding), **kwargs
    885                     )
    886                 except UnicodeDecodeError:

/usr/lib64/python2.7/json/__init__.pyc in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    349     if parse_constant is not None:
    350         kw['parse_constant'] = parse_constant
--> 351     return cls(encoding=encoding, **kw).decode(s)

/usr/lib64/python2.7/json/decoder.pyc in decode(self, s, _w)
    364 
    365         """
--> 366         obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    367         end = _w(s, end).end()
    368         if end != len(s):

/usr/lib64/python2.7/json/decoder.pyc in raw_decode(self, s, idx)
    380         """
    381         try:
--> 382             obj, end = self.scan_once(s, idx)
    383         except StopIteration:
    384             raise ValueError("No JSON object could be decoded")

/home/me/projects/myproject/lib/python2.7/site-packages/h2o/backend/connection.pyc in __new__(cls, keyvals)
    823         for k, v in keyvals:
    824             if k == "__meta" and isinstance(v, dict):
--> 825                 schema = v["schema_name"]
    826                 break
    827             if k == "__schema" and is_type(v, str):

KeyError: u'schema_name'

H2O 当前不支持 MapR 6。目前 H2O 最高支持 MapR 5.2。

有关受支持的 Hadoop 版本,请参阅 downloads 页面。