自定义指标在每个纪元中经过许多步骤后变为 NaN

Custom metric Turns to NaN after many steps in each epoch

我在我的模型中使用自定义召回率和精度指标。我知道他们将它们内置到 Keras 中,但我只关心 类.

之一

当我开始一个纪元时,我得到了指标的打印值,但经过许多步骤后,一个指标 returns NaN,几百个纪元后,第二个自定义指标显示 NaN。

recall 指标写的一样

def precision(y_true, y_pred):
    '''
    Calculates precision metric over gun label
    Precision = TP/(TP+FP)
    '''
    #I only care about the last label
    y_true = y_true[:,-1]
    y_pred = y_pred[:,-1]
    y_pred = tf.where(y_pred>.5, 1, 0)

    y_pred = tf.cast(y_pred, tf.float32)
    y_true = tf.cast(y_true, tf.float32)

    true_positives = K.sum(y_true * y_pred)
    false_positive = tf.math.reduce_sum(tf.where(tf.logical_and(tf.not_equal(y_true,y_pred), y_pred==1), 1, 0))
    false_positive = tf.cast(false_positive, tf.float32)
    precision = true_positives / (true_positives + false_positive)
    return precision

训练多标签所以我的最后一个密集层是 preds = Dense(num_classes, activation='sigmoid', name='Classifier')(x)

model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy', precision, recall])
model.fit(train_ds, steps_per_epoch=10000, validation_data=valid_ds, validation_steps=1181,  epochs=200)
18/10000 [............] - ETA: 6:43 - loss: 0.6919 - accuracy: 0.0046 - precision: 0.2597 - recall: 0.4691

315/10000 [...........] - ETA: 7:56 - loss: 0.4174 - accuracy: 0.1145 - precision: nan - recall: 0.6115

10000/10000 [=========>] - ETA: 0s - loss: 0.0797 - accuracy: 0.5432 - precision: nan - recall: nan
10000/10000 [=========>] - 576s 56ms/step - loss: 0.0797 - accuracy: 0.5432 - precision: nan - recall: nan - val_loss: 0.0557 - val_accuracy: 0.5807 - val_precision: 0.9698 - val_recall: 0.9529

在每个纪元开始时,指标再次显示数字,但经过许多步骤后又回到 NaN。通过观察,我可以确认它们不会在 NaN 之前变为 0 或 1。

问题是除以零。我在每个分母中添加了一个小值来解决问题。如果网络在任何批次中都没有正面预测,就会发生这种情况。这就是它断断续续发生的原因。

import tensorflow.keras.backend as K

precision = true_positives / (true_positives + false_positive + K.epsilon())