张量流检查点消息与训练指标不匹配

tensorflow checkpoint messageno match with training metric

我使用 checkpoint = tf.keras.callbacks.ModelCheckpoint(filepath=ckpt_path, save_best_only=True, monitor='val_auc', verbose=1) 设置了检查点回调。当我查看我的训练日志时,它似乎不匹配。

:Epoch 00004: val_auc improved from 0.96440 to 0.96298, saving model to xxxxxxx
py log:2878/2878 - 352s - loss: 0.2071 - tp: 1207371.0000 - fp: 66819.0000 - tn: 1484009.0000 - fn: 187884.0000 - accuracy: 0.9135 - precision: 0.9476 - recall: 0.8653 - auc: 0.9698 - pr: 0.9731 - val_loss: 0.2388 - val_tp: 338551.0000 - val_fp: 5482.0000 - val_tn: 76038.0000 - val_fn: 49446.0000 - val_accuracy: 0.8830 - val_precision: 0.9841 - val_recall: 0.8726 - val_auc: 0.9630 - val_pr: 0.9921

为什么val_auc从大变小了,为什么下一行是0.96298而不是0.9630

您需要将 mode 参数设置为 max 以每次保存最大值 val_auc

checkpoint = tf.keras.callbacks.ModelCheckpoint(filepath=ckpt_path, mode="max", save_best_only=True, monitor='val_auc', verbose=1)

参数 mode 的文档对此进行了解释:

mode: one of {'auto', 'min', 'max'}. If save_best_only=True, the decision to overwrite the current save file is made based on either the maximization or the minimization of the monitored quantity. For val_acc, this should be max, for val_loss this should be min, etc. In auto mode, the mode is set to max if the quantities monitored are 'acc' or start with 'fmeasure' and are set to min for the rest of the quantities.

由于默认值为 auto 并且您正在监视 val_auc 而不是 acc 或以 fmeasure 开头的任何内容,因此它被设置为 min

日志中的 val_auc 四舍五入到小数点后 4 位,这就是为什么您将其视为 0.9630 而不是 0.96298