MLflow 在每个纪元后保存权重

MLflow saving weights after each epoch

我一直在使用 MLflow 跟踪测试一些小示例,但对于我的用例,我希望在每个纪元之后保存权重。 有时我会在运行完全完成之前终止它们(我不能使用提前停止),但我现在的体验是权重没有保存到跟踪 ui 服务器。 有没有办法在每个时代之后做到这一点?

将权重保存到磁盘,然后将它们记录为工件。只要 checkpoints/weights 保存到磁盘,您就可以用 mlflow_log_artifact()mlflow_log_artifacts() 记录它们。来自docs,

mlflow.log_artifact() logs a local file or directory as an artifact, optionally taking an artifact_path to place it in within the run’s artifact URI. Run artifacts can be organized into directories, so you can place the artifact in a directory this way.

mlflow.log_artifacts() logs all the files in a given directory as artifacts, again taking an optional artifact_path.