使用 mlflow.tensorflow.autolog() 时在 MLFlow UI 中自定义指标可视化

Customize metric visualization in MLFlow UI when using mlflow.tensorflow.autolog()

我正在尝试将 MLFlow 集成到我的项目中。因为我使用 tf.keras.fit_generator() 进行训练,所以我利用 mlflow.tensorflow.autolog()(此处为 docs)启用指标和参数的自动记录:

    model = Unet()
    optimizer = tf.keras.optimizers.Adam(LEARNING_RATE)

    metrics = [IOUScore(threshold=0.5), FScore(threshold=0.5)]
    model.compile(optimizer, customized_loss, metrics)

    callbacks = [
        tf.keras.callbacks.ModelCheckpoint("model.h5", save_weights_only=True, save_best_only=True, mode='min'),
        tf.keras.callbacks.TensorBoard(log_dir='./logs', profile_batch=0, update_freq='batch'),
    ]


    train_dataset = Dataset(src_dir=SOURCE_DIR)

    train_data_loader = DataLoader(train_dataset, BATCH_SIZE, shuffle=True)

   
    with mlflow.start_run():
        mlflow.tensorflow.autolog()
        mlflow.log_param("batch_size", BATCH_SIZE)

        model.fit_generator(
            train_data_loader,
            steps_per_epoch=len(train_data_loader),
            epochs=EPOCHS,
            callbacks=callbacks   
            )

我期待这样的结果(只是从 docs 获取的演示):

然而,训练结束后,我得到的是:

我如何配置才能使度量图在每个时期更新并显示其值,而不是仅显示最新值?

四处寻找后,我找到了 this issue related to my problem above. Actually, all my metrics just logged once each training (instead of each epoch as my intuitive thought). The reason is I didn't specify the every_n_iter parameter in mlflow.tensorflow.autolog(), which indicates how many 'iterations' must pass before MLflow logs metric executed (see the docs)。因此,将我的代码更改为:

mlflow.tensorflow.autolog(every_n_iter=1)

已解决问题。

P/s:请记住,在 TF 2.x 中,一个 'iteration' 是一个纪元(在 TF 1.x 中是一个批次)。