您如何 "Permanently" 在 Mlflow 中删除实验?
How Do You "Permanently" Delete An Experiment In Mlflow?
没有任何地方记录永久删除实验。我正在使用带有后端 postgres db
的 Mlflow
这是我的运行:
client = MlflowClient(tracking_uri=server)
client.delete_experiment(1)
这会删除实验,但是当我 运行 一个与我刚刚删除的实验同名的新实验时,它将 return 这个错误:
mlflow.exceptions.MlflowException: Cannot set a deleted experiment 'cross-sell' as the active experiment. You can restore the experiment, or permanently delete the experiment to create a new one.
我在说明如何永久删除所有内容的文档中找不到任何地方。
不幸的是,目前似乎无法通过 UI 或 CLI 执行此操作:-/
执行此操作的方式取决于您使用的后端文件存储的类型。
文件存储:
如果您使用文件系统作为存储机制(默认),那么这很容易。 'deleted' 实验已移至 .trash
文件夹。你只需要清除它:
rm -rf mlruns/.trash/*
从 documentation (1.7.2) 的当前版本开始,他们表示:
It is recommended to use a cron job or an alternate workflow mechanism to clear .trash
folder.
SQL 数据库:
这比较棘手,因为需要删除依赖项。我正在使用 MySQL,这些命令对我有用:
USE mlflow_db; # the name of your database
DELETE FROM experiment_tags WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
);
DELETE FROM latest_metrics WHERE run_uuid=ANY(
SELECT run_uuid FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
)
);
DELETE FROM metrics WHERE run_uuid=ANY(
SELECT run_uuid FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
)
);
DELETE FROM tags WHERE run_uuid=ANY(
SELECT run_uuid FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
)
);
DELETE FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
);
DELETE FROM experiments where lifecycle_stage="deleted";
从 mlflow 1.11.0 开始,在实验中永久删除运行的推荐方法是:mlflow gc [OPTIONS]
。
根据文档,mlflow gc
将
Permanently delete runs in the deleted lifecycle stage from the specified backend store. This command deletes all artifacts and metadata associated with the specified runs.
在从 MLFlow 跟踪客户端删除实验后,扩展 @Lee Netherton's answer, you can use PyMySQL 以执行这些查询并从 MLFlow 跟踪服务器中删除所有元数据。
import pymysql
def perm_delete_exp():
connection = pymysql.connect(
host='localhost',
user='user',
password='password',
db='mlflow',
cursorclass=pymysql.cursors.DictCursor)
with connection.cursor() as cursor:
queries = """
USE mlflow;
DELETE FROM experiment_tags WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted");
DELETE FROM latest_metrics WHERE run_uuid=ANY(SELECT run_uuid FROM runs WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted"));
DELETE FROM metrics WHERE run_uuid=ANY(SELECT run_uuid FROM runs WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted"));
DELETE FROM tags WHERE run_uuid=ANY(SELECT run_uuid FROM runs WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted"));
DELETE FROM runs WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted");
DELETE FROM experiments where lifecycle_stage="deleted";
"""
for query in queries.splitlines()[1:-1]:
cursor.execute(query.strip())
connection.commit()
connection.close()
您可以(也许应该)一次执行整个查询,但我发现这样调试更容易。
如果您使用 PostgreSQL 作为后端存储,如果您想永久删除 MLFlow 的垃圾,我将添加 SQL 命令。
更改为您的 MLFlow 数据库,例如通过使用:\c mlflow
然后:
DELETE FROM experiment_tags WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
);
DELETE FROM latest_metrics WHERE run_uuid=ANY(
SELECT run_uuid FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
)
);
DELETE FROM metrics WHERE run_uuid=ANY(
SELECT run_uuid FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
)
);
DELETE FROM tags WHERE run_uuid=ANY(
SELECT run_uuid FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
)
);
DELETE FROM params WHERE run_uuid=ANY(
SELECT run_uuid FROM runs where experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
));
DELETE FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
);
DELETE FROM experiments where lifecycle_stage='deleted';
区别在于,我在此处添加了 'params' Table SQL 删除命令。
不幸的是,在我的例子中,上面的 SQL 命令不适用于 SQLITE。
这是 SQL 版本在数据库 IDE 中使用 sqlite,将“any”命令替换为“in”:
DELETE FROM experiment_tags WHERE experiment_id in (
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
);
DELETE FROM latest_metrics WHERE run_uuid in (
SELECT run_uuid FROM runs WHERE experiment_id in (
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
)
);
DELETE FROM metrics WHERE run_uuid in (
SELECT run_uuid FROM runs WHERE experiment_id in (
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
)
);
DELETE FROM tags WHERE run_uuid in (
SELECT run_uuid FROM runs WHERE experiment_id in (
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
)
);
DELETE FROM params WHERE run_uuid in (
SELECT run_uuid FROM runs where experiment_id in (
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
));
DELETE FROM runs WHERE experiment_id in (
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
);
DELETE FROM experiments where lifecycle_stage='deleted';
没有任何地方记录永久删除实验。我正在使用带有后端 postgres db
的 Mlflow这是我的运行:
client = MlflowClient(tracking_uri=server)
client.delete_experiment(1)
这会删除实验,但是当我 运行 一个与我刚刚删除的实验同名的新实验时,它将 return 这个错误:
mlflow.exceptions.MlflowException: Cannot set a deleted experiment 'cross-sell' as the active experiment. You can restore the experiment, or permanently delete the experiment to create a new one.
我在说明如何永久删除所有内容的文档中找不到任何地方。
不幸的是,目前似乎无法通过 UI 或 CLI 执行此操作:-/
执行此操作的方式取决于您使用的后端文件存储的类型。
文件存储:
如果您使用文件系统作为存储机制(默认),那么这很容易。 'deleted' 实验已移至 .trash
文件夹。你只需要清除它:
rm -rf mlruns/.trash/*
从 documentation (1.7.2) 的当前版本开始,他们表示:
It is recommended to use a cron job or an alternate workflow mechanism to clear
.trash
folder.
SQL 数据库:
这比较棘手,因为需要删除依赖项。我正在使用 MySQL,这些命令对我有用:
USE mlflow_db; # the name of your database
DELETE FROM experiment_tags WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
);
DELETE FROM latest_metrics WHERE run_uuid=ANY(
SELECT run_uuid FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
)
);
DELETE FROM metrics WHERE run_uuid=ANY(
SELECT run_uuid FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
)
);
DELETE FROM tags WHERE run_uuid=ANY(
SELECT run_uuid FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
)
);
DELETE FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
);
DELETE FROM experiments where lifecycle_stage="deleted";
从 mlflow 1.11.0 开始,在实验中永久删除运行的推荐方法是:mlflow gc [OPTIONS]
。
根据文档,mlflow gc
将
Permanently delete runs in the deleted lifecycle stage from the specified backend store. This command deletes all artifacts and metadata associated with the specified runs.
在从 MLFlow 跟踪客户端删除实验后,扩展 @Lee Netherton's answer, you can use PyMySQL 以执行这些查询并从 MLFlow 跟踪服务器中删除所有元数据。
import pymysql
def perm_delete_exp():
connection = pymysql.connect(
host='localhost',
user='user',
password='password',
db='mlflow',
cursorclass=pymysql.cursors.DictCursor)
with connection.cursor() as cursor:
queries = """
USE mlflow;
DELETE FROM experiment_tags WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted");
DELETE FROM latest_metrics WHERE run_uuid=ANY(SELECT run_uuid FROM runs WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted"));
DELETE FROM metrics WHERE run_uuid=ANY(SELECT run_uuid FROM runs WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted"));
DELETE FROM tags WHERE run_uuid=ANY(SELECT run_uuid FROM runs WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted"));
DELETE FROM runs WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted");
DELETE FROM experiments where lifecycle_stage="deleted";
"""
for query in queries.splitlines()[1:-1]:
cursor.execute(query.strip())
connection.commit()
connection.close()
您可以(也许应该)一次执行整个查询,但我发现这样调试更容易。
如果您使用 PostgreSQL 作为后端存储,如果您想永久删除 MLFlow 的垃圾,我将添加 SQL 命令。
更改为您的 MLFlow 数据库,例如通过使用:\c mlflow
然后:
DELETE FROM experiment_tags WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
);
DELETE FROM latest_metrics WHERE run_uuid=ANY(
SELECT run_uuid FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
)
);
DELETE FROM metrics WHERE run_uuid=ANY(
SELECT run_uuid FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
)
);
DELETE FROM tags WHERE run_uuid=ANY(
SELECT run_uuid FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
)
);
DELETE FROM params WHERE run_uuid=ANY(
SELECT run_uuid FROM runs where experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
));
DELETE FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
);
DELETE FROM experiments where lifecycle_stage='deleted';
区别在于,我在此处添加了 'params' Table SQL 删除命令。
不幸的是,在我的例子中,上面的 SQL 命令不适用于 SQLITE。 这是 SQL 版本在数据库 IDE 中使用 sqlite,将“any”命令替换为“in”:
DELETE FROM experiment_tags WHERE experiment_id in (
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
);
DELETE FROM latest_metrics WHERE run_uuid in (
SELECT run_uuid FROM runs WHERE experiment_id in (
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
)
);
DELETE FROM metrics WHERE run_uuid in (
SELECT run_uuid FROM runs WHERE experiment_id in (
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
)
);
DELETE FROM tags WHERE run_uuid in (
SELECT run_uuid FROM runs WHERE experiment_id in (
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
)
);
DELETE FROM params WHERE run_uuid in (
SELECT run_uuid FROM runs where experiment_id in (
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
));
DELETE FROM runs WHERE experiment_id in (
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
);
DELETE FROM experiments where lifecycle_stage='deleted';