在 Docker 容器中 运行ning MLflow 运行 时无法连接到 MLFLOW_TRACKING_URI

Unable to connect to MLFLOW_TRACKING_URI when running MLflow run in Docker container

我已经在本地设置了一个 mlflow 服务器 http://localhost:5000

我按照 https://github.com/mlflow/mlflow/tree/master/examples/docker 的说明进行操作,并尝试 运行 示例 docker 和

/mlflow/examples/docker$ mlflow run . -P alpha=0.5

但是我遇到了以下错误。

2021/05/09 17:11:20 INFO mlflow.projects.docker: === Building docker image docker-example:7530274 ===
2021/05/09 17:11:20 INFO mlflow.projects.utils: === Created directory /tmp/tmp9wpxyzd_ for downloading remote URIs passed to arguments of type 'path' ===
2021/05/09 17:11:20 INFO mlflow.projects.backend.local: === Running command 'docker run --rm -v /home/mlf/mlf/0/ae69145133bf49efac22b1d390c354f1/artifacts:/home/mlf/mlf/0/ae69145133bf49efac22b1d390c354f1/artifacts -e MLFLOW_RUN_ID=ae69145133bf49efac22b1d390c354f1 -e MLFLOW_TRACKING_URI=http://localhost:5000 -e MLFLOW_EXPERIMENT_ID=0 docker-example:7530274 python train.py --alpha 0.5 --l1-ratio 0.1' in run with ID 'ae69145133bf49efac22b1d390c354f1' === 
/opt/conda/lib/python2.7/site-packages/mlflow/__init__.py:55: DeprecationWarning: MLflow support for Python 2 is deprecated and will be dropped in a future release. At that point, existing Python 2 workflows that use MLflow will continue to work without modification, but Python 2 users will no longer get access to the latest MLflow features and bugfixes. We recommend that you upgrade to Python 3 - see https://docs.python.org/3/howto/pyporting.html for a migration guide.
  "for a migration guide.", DeprecationWarning)
Traceback (most recent call last):
  File "train.py", line 56, in <module>
    with mlflow.start_run():
  File "/opt/conda/lib/python2.7/site-packages/mlflow/tracking/fluent.py", line 122, in start_run
    active_run_obj = MlflowClient().get_run(existing_run_id)
  File "/opt/conda/lib/python2.7/site-packages/mlflow/tracking/client.py", line 96, in get_run
    return self._tracking_client.get_run(run_id)
  File "/opt/conda/lib/python2.7/site-packages/mlflow/tracking/_tracking_service/client.py", line 49, in get_run
    return self.store.get_run(run_id)
  File "/opt/conda/lib/python2.7/site-packages/mlflow/store/tracking/rest_store.py", line 92, in get_run
    response_proto = self._call_endpoint(GetRun, req_body)
  File "/opt/conda/lib/python2.7/site-packages/mlflow/store/tracking/rest_store.py", line 32, in _call_endpoint
    return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)
  File "/opt/conda/lib/python2.7/site-packages/mlflow/utils/rest_utils.py", line 133, in call_endpoint
    host_creds=host_creds, endpoint=endpoint, method=method, params=json_body)
  File "/opt/conda/lib/python2.7/site-packages/mlflow/utils/rest_utils.py", line 70, in http_request
    url=url, headers=headers, verify=verify, **kwargs)
  File "/opt/conda/lib/python2.7/site-packages/mlflow/utils/rest_utils.py", line 51, in request_with_ratelimit_retries
    response = requests.request(**kwargs)
  File "/opt/conda/lib/python2.7/site-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/opt/conda/lib/python2.7/site-packages/requests/sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "/opt/conda/lib/python2.7/site-packages/requests/sessions.py", line 618, in send
    r = adapter.send(request, **kwargs)
  File "/opt/conda/lib/python2.7/site-packages/requests/adapters.py", line 508, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=5000): Max retries exceeded with url: /api/2.0/mlflow/runs/get?run_uuid=ae69145133bf49efac22b1d390c354f1&run_id=ae69145133bf49efac22b1d390c354f1 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f5cbd80d690>: Failed to establish a new connection: [Errno 111] Connection refused',))
2021/05/09 17:11:22 ERROR mlflow.cli: === Run (ID 'ae69145133bf49efac22b1d390c354f1') failed ===

有什么解决办法吗?我尝试在 MLproject 文件中添加以下内容,但没有帮助

environment: [["network", "host"], ["add-host", "host.docker.internal:host-gateway"]]

感谢您的帮助! =)

运行 MLflow 服务器将使用您的机器 IP 而不是 localhost。然后将 mlflow run 指向该 IP 而不是 http://localhost:5000。主要原因是Docker进程的localhost是自己的,不是你的机器。