为什么我的损失函数返回 nan？

Question

所以我使用 Tensorflow 后端在 Keras 中定义了这个自定义损失函数，以最小化背景提取自动编码器。它应该确保预测 x_hat 不会偏离 B0 批次的预测中值太远。

def ben_loss(x, x_hat):

    B0 = tf_median(tf.transpose(x_hat))
    sigma = tf.reduce_mean(tf.sqrt(tf.abs(x_hat - B0) / 0.4), axis=0)
    # I divide by sigma in the next step. So I add a small float32 to sigma
    # so as to prevent background_term from becoming a nan.
    sigma += 1e-22 
    background_term = tf.reduce_mean(tf.abs(x_hat - B0) / sigma, axis=-1)
    bce = binary_crossentropy(x, x_hat)
    loss = bce + background_term

    return loss

当我尝试使用此损失函数最小化网络时，损失几乎立即变成了 NaN。有谁知道为什么会这样？您可以通过克隆我的存储库和运行这个 script 来重现错误。

Answer 1

这是因为 tf.abs(x_hat - B0) 正在接近一个条目全为零的张量。这使得 sigma wrt x_hat 的导数成为 NaN。解决方案是为该数量添加一个小值。

def ben_loss(x, x_hat):

    B0 = tf_median(tf.transpose(x_hat))
    F0 = tf.abs(x_hat - B0) + 1e-10
    sigma = tf.reduce_mean(tf.sqrt( / 0.4), axis=0)
    background_term = tf.reduce_mean(F0 / sigma, axis=-1)
    bce = binary_crossentropy(x, x_hat)
    loss = bce + background_term

    return loss

为什么我的损失函数返回 nan？

Why is my loss function returning nan?

gradient-descent

keras

tensorflow