您如何 "Permanently" 在 Mlflow 中删除实验?

How Do You "Permanently" Delete An Experiment In Mlflow?

没有任何地方记录永久删除实验。我正在使用带有后端 postgres db

的 Mlflow

这是我的运行:

client = MlflowClient(tracking_uri=server)
client.delete_experiment(1)

这会删除实验,但是当我 运行 一个与我刚刚删除的实验同名的新实验时,它将 return 这个错误:

mlflow.exceptions.MlflowException: Cannot set a deleted experiment 'cross-sell' as the active experiment. You can restore the experiment, or permanently delete the  experiment to create a new one.

我在说明如何永久删除所有内容的文档中找不到任何地方。

不幸的是,目前似乎无法通过 UI 或 CLI 执行此操作:-/

执行此操作的方式取决于您使用的后端文件存储的类型。

文件存储:

如果您使用文件系统作为存储机制(默认),那么这很容易。 'deleted' 实验已移至 .trash 文件夹。你只需要清除它:

rm -rf mlruns/.trash/*

documentation (1.7.2) 的当前版本开始,他们表示:

It is recommended to use a cron job or an alternate workflow mechanism to clear .trash folder.

SQL 数据库:

这比较棘手,因为需要删除依赖项。我正在使用 MySQL,这些命令对我有用:

USE mlflow_db;  # the name of your database
DELETE FROM experiment_tags WHERE experiment_id=ANY(
    SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
);
DELETE FROM latest_metrics WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
    )
);
DELETE FROM metrics WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
    )
);
DELETE FROM tags WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
    )
);
DELETE FROM runs WHERE experiment_id=ANY(
    SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
);
DELETE FROM experiments where lifecycle_stage="deleted";

从 mlflow 1.11.0 开始,在实验中永久删除运行的推荐方法是:mlflow gc [OPTIONS]

根据文档,mlflow gc

Permanently delete runs in the deleted lifecycle stage from the specified backend store. This command deletes all artifacts and metadata associated with the specified runs.

在从 MLFlow 跟踪客户端删除实验后,扩展 @Lee Netherton's answer, you can use PyMySQL 以执行这些查询并从 MLFlow 跟踪服务器中删除所有元数据。

import pymysql

def perm_delete_exp():
    connection = pymysql.connect(
        host='localhost',
        user='user',
        password='password',
        db='mlflow',
        cursorclass=pymysql.cursors.DictCursor)
    with connection.cursor() as cursor:
        queries = """
            USE mlflow;
            DELETE FROM experiment_tags WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted");
            DELETE FROM latest_metrics WHERE run_uuid=ANY(SELECT run_uuid FROM runs WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted"));
            DELETE FROM metrics WHERE run_uuid=ANY(SELECT run_uuid FROM runs WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted"));
            DELETE FROM tags WHERE run_uuid=ANY(SELECT run_uuid FROM runs WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted"));
            DELETE FROM runs WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted");
            DELETE FROM experiments where lifecycle_stage="deleted";
        """
        for query in queries.splitlines()[1:-1]:
            cursor.execute(query.strip())
    connection.commit()
    connection.close()

您可以(也许应该)一次执行整个查询,但我发现这样调试更容易。

如果您使用 PostgreSQL 作为后端存储,如果您想永久删除 MLFlow 的垃圾,我将添加 SQL 命令。

更改为您的 MLFlow 数据库,例如通过使用:\c mlflow 然后:

DELETE FROM experiment_tags WHERE experiment_id=ANY(
    SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
);
DELETE FROM latest_metrics WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM metrics WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM tags WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM params WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs where experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
));
DELETE FROM runs WHERE experiment_id=ANY(
    SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
);
DELETE FROM experiments where lifecycle_stage='deleted';

区别在于,我在此处添加了 'params' Table SQL 删除命令。

不幸的是,在我的例子中,上面的 SQL 命令不适用于 SQLITE。 这是 SQL 版本在数据库 IDE 中使用 sqlite,将“any”命令替换为“in”:

DELETE FROM experiment_tags WHERE experiment_id in (
    SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
    );
DELETE FROM latest_metrics WHERE run_uuid in (
    SELECT run_uuid FROM runs WHERE experiment_id in (
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM metrics WHERE run_uuid in (
    SELECT run_uuid FROM runs WHERE experiment_id in (
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM tags WHERE run_uuid in (
    SELECT run_uuid FROM runs WHERE experiment_id in (
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM params WHERE run_uuid in (
    SELECT run_uuid FROM runs where experiment_id in (
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
));
DELETE FROM runs WHERE experiment_id in (
    SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
);
DELETE FROM experiments where lifecycle_stage='deleted';