分类交叉熵背后的直觉

Question

我正在尝试制作分类交叉熵损失函数以更好地理解其背后的直觉。到目前为止，我的实现如下所示：

# Observations
y_true = np.array([[0, 1, 0], [0, 0, 1]])
y_pred = np.array([[0.05, 0.95, 0.05], [0.1, 0.8, 0.1]])

# Loss calculations
def categorical_loss():
  loss1 = -(0.0 * np.log(0.05) + 1.0 * np.log(0.95) + 0 * np.log(0.05))
  loss2 = -(0.0 * np.log(0.1) + 0.0 * np.log(0.8) + 1.0 * np.log(0.1))
  loss = (loss1 + loss2) / 2 # divided by 2 because y_true and y_pred have 2 observations and 3 classes
  return loss

# Show loss
print(categorical_loss()) # 1.176939193690798

但是我不明白函数在return正确值时应该如何表现：

来自 y_pred 的至少一个数字是 0 或 1 因为 log 函数 returns -inf 或 0 以及在这种情况下代码实现应该是什么样子
y_true 中至少有一个数是 0 因为乘以 0 总是 returns 0 和 np.log(0.95) 的值将然后被丢弃，代码实现在这种情况下应该是什么样子

Answer 1

关于 y_pred 是 0 还是 1，深入研究两者的 Keras 后端源代码 binary_crossentropy and categorical_crossentropy，我们得到：

def binary_crossentropy(target, output, from_logits=False):
    if not from_logits:
        output = np.clip(output, 1e-7, 1 - 1e-7)
        output = np.log(output / (1 - output))
    return (target * -np.log(sigmoid(output)) +
            (1 - target) * -np.log(1 - sigmoid(output)))


def categorical_crossentropy(target, output, from_logits=False):
    if from_logits:
        output = softmax(output)
    else:
        output /= output.sum(axis=-1, keepdims=True)
    output = np.clip(output, 1e-7, 1 - 1e-7)
    return np.sum(target * -np.log(output), axis=-1, keepdims=False)

从那里你可以清楚地看到，在这两个函数中，有一个 clipping 操作 output （即预测），以避免无穷大对数：

output = np.clip(output, 1e-7, 1 - 1e-7)

因此，这里的 y_pred 在基础计算中永远不会恰好为 0 或 1。其他框架中的处理类似。

关于y_true为0，不存在任何问题-相应的项设置为0，根据数学定义应该如此。

分类交叉熵背后的直觉

Intuition behind categorical cross entropy

python

numpy

machine-learning

cross-entropy