使用 mlflow.tensorflow.autolog() 时在 MLFlow UI 中自定义指标可视化
Customize metric visualization in MLFlow UI when using mlflow.tensorflow.autolog()
我正在尝试将 MLFlow 集成到我的项目中。因为我使用 tf.keras.fit_generator()
进行训练,所以我利用 mlflow.tensorflow.autolog()
(此处为 docs)启用指标和参数的自动记录:
model = Unet()
optimizer = tf.keras.optimizers.Adam(LEARNING_RATE)
metrics = [IOUScore(threshold=0.5), FScore(threshold=0.5)]
model.compile(optimizer, customized_loss, metrics)
callbacks = [
tf.keras.callbacks.ModelCheckpoint("model.h5", save_weights_only=True, save_best_only=True, mode='min'),
tf.keras.callbacks.TensorBoard(log_dir='./logs', profile_batch=0, update_freq='batch'),
]
train_dataset = Dataset(src_dir=SOURCE_DIR)
train_data_loader = DataLoader(train_dataset, BATCH_SIZE, shuffle=True)
with mlflow.start_run():
mlflow.tensorflow.autolog()
mlflow.log_param("batch_size", BATCH_SIZE)
model.fit_generator(
train_data_loader,
steps_per_epoch=len(train_data_loader),
epochs=EPOCHS,
callbacks=callbacks
)
我期待这样的结果(只是从 docs 获取的演示):
然而,训练结束后,我得到的是:
我如何配置才能使度量图在每个时期更新并显示其值,而不是仅显示最新值?
四处寻找后,我找到了 this issue related to my problem above. Actually, all my metrics just logged once each training (instead of each epoch as my intuitive thought). The reason is I didn't specify the every_n_iter
parameter in mlflow.tensorflow.autolog()
, which indicates how many 'iterations' must pass before MLflow logs metric executed (see the docs)。因此,将我的代码更改为:
mlflow.tensorflow.autolog(every_n_iter=1)
已解决问题。
P/s:请记住,在 TF 2.x 中,一个 'iteration' 是一个纪元(在 TF 1.x 中是一个批次)。
我正在尝试将 MLFlow 集成到我的项目中。因为我使用 tf.keras.fit_generator()
进行训练,所以我利用 mlflow.tensorflow.autolog()
(此处为 docs)启用指标和参数的自动记录:
model = Unet()
optimizer = tf.keras.optimizers.Adam(LEARNING_RATE)
metrics = [IOUScore(threshold=0.5), FScore(threshold=0.5)]
model.compile(optimizer, customized_loss, metrics)
callbacks = [
tf.keras.callbacks.ModelCheckpoint("model.h5", save_weights_only=True, save_best_only=True, mode='min'),
tf.keras.callbacks.TensorBoard(log_dir='./logs', profile_batch=0, update_freq='batch'),
]
train_dataset = Dataset(src_dir=SOURCE_DIR)
train_data_loader = DataLoader(train_dataset, BATCH_SIZE, shuffle=True)
with mlflow.start_run():
mlflow.tensorflow.autolog()
mlflow.log_param("batch_size", BATCH_SIZE)
model.fit_generator(
train_data_loader,
steps_per_epoch=len(train_data_loader),
epochs=EPOCHS,
callbacks=callbacks
)
我期待这样的结果(只是从 docs 获取的演示):
然而,训练结束后,我得到的是:
我如何配置才能使度量图在每个时期更新并显示其值,而不是仅显示最新值?
四处寻找后,我找到了 this issue related to my problem above. Actually, all my metrics just logged once each training (instead of each epoch as my intuitive thought). The reason is I didn't specify the every_n_iter
parameter in mlflow.tensorflow.autolog()
, which indicates how many 'iterations' must pass before MLflow logs metric executed (see the docs)。因此,将我的代码更改为:
mlflow.tensorflow.autolog(every_n_iter=1)
已解决问题。
P/s:请记住,在 TF 2.x 中,一个 'iteration' 是一个纪元(在 TF 1.x 中是一个批次)。