MLFlow 活动运行与环境运行 id 不匹配

Question

我正在尝试执行 MLFlow 运行，但在尝试了很多东西后仍然遇到以下错误。


run = mlflow.active_run()
if run:
    print("Active run_id: {}".format(run.info.run_id))
    mlflow.end_run()

mlflow.set_experiment('TNF_EXP') 
mlflow.set_tracking_uri("http://localhost:5000/") # Actual Server URI instead of localhost
experiment = mlflow.get_experiment_by_name("TNF_EXP")

with mlflow.start_run(experiment_id=experiment.experiment_id) as run:
...
...

mlflow.end_run()

错误-

File "/.../ModelTrainer.py", line 108, in train
    with mlflow.start_run(experiment_id=experiment.experiment_id) as run:
  File "/usr/local/lib/python3.6/site-packages/mlflow/tracking/fluent.py", line 207, in start_run
    "arguments".format(existing_run_id)
mlflow.exceptions.MlflowException: Cannot start run with ID e9953eb5918845bb9be1xxxxxx because active run ID does not match environment run ID. Make sure --experiment-name or --experiment-id matches experiment set with set_experiment(), or just use command-line arguments
2021/02/11 09:41:36 ERROR mlflow.cli: === Run (ID 'e9953eb5918845bb9be1xxxxxx') failed ===

我注意到我之前有一个 active run，所以我包含了第一个 if block 来结束那个运行。代码运行成功，我能够在 MLFlow UI 上记录数据，但现在当我运行它时，我开始遇到同样的问题。在当前开始一个新的运行之前没有找到活动的运行。

FYI, I am running the code on Azure server with the respective tracking URI mentioned in the code.

但是，如果我在 CLI 的 mlflow run 命令中包含参数 --experiment-name="TNF_EXP"，代码运行就可以了

Answer 1

这主要是因为您使用 default experiment name 启动了运行，然后您试图将 experiment_name 设置为“TNF_EXP”。

将建议您使用 mlflow.run(..., experiment_name="TNF_EXP") python 方法，然后运行从 CLI.

中选择它

您可以找到更多信息here。

Answer 2

MLflow 项目当前支持的行为是使用 mlflow cli 定义实验名称或 ID（如果您知道 ID）。

这需要对 MLflow 项目的执行方式进行系统更改，因为 mlflow 运行 CLI 命令将创建一个主运行（在 --experiment-name 参数或默认值下）。 start_run 在 main 中创建的嵌套运行需要属于同一个父实验。

如果您想在特定实验下登录，唯一支持的模型是在 CLI 命令中使用 --experiment-name 或 --experiment-id。

您现在可以运行它作为 : mlflow 运行。 --实验名称测试

MLFlow 活动 运行 与环境 运行 id 不匹配

MLFlow active run does not match environment run id

python

mlflow

MLFlow 活动运行与环境运行 id 不匹配