categorical_crossentropy 在keras中是如何实现的？

Question

我正在尝试应用蒸馏的概念，主要是为了训练一个新的较小的网络，使其与原始网络的功能相同，但计算量更少。

我有每个样本的 softmax 输出而不是 logits。

我的问题是，分类交叉熵损失函数是如何实现的？就像它取原始标签的最大值并将其与同一索引中的相应预测值相乘，或者它对所有 logits（One Hot 编码）进行求和，如公式所示：

Answer 1

我看到您使用了 tensorflow 标签，所以我猜这就是您使用的后端？

def categorical_crossentropy(output, target, from_logits=False):
"""Categorical crossentropy between an output tensor and a target tensor.
# Arguments
    output: A tensor resulting from a softmax
        (unless `from_logits` is True, in which
        case `output` is expected to be the logits).
    target: A tensor of the same shape as `output`.
    from_logits: Boolean, whether `output` is the
        result of a softmax, or is a tensor of logits.
# Returns
    Output tensor.

此代码来自keras source code。直接查看代码应该可以回答您的所有问题 :) 如果您需要更多信息，请询问！

编辑：

这是您感兴趣的代码：

 # Note: tf.nn.softmax_cross_entropy_with_logits
# expects logits, Keras expects probabilities.
if not from_logits:
    # scale preds so that the class probas of each sample sum to 1
    output /= tf.reduce_sum(output,
                            reduction_indices=len(output.get_shape()) - 1,
                            keep_dims=True)
    # manual computation of crossentropy
    epsilon = _to_tensor(_EPSILON, output.dtype.base_dtype)
    output = tf.clip_by_value(output, epsilon, 1. - epsilon)
    return - tf.reduce_sum(target * tf.log(output),
                          reduction_indices=len(output.get_shape()) - 1)

如果您查看 return，他们会总结...:)

Answer 2

作为对"Do you happen to know what the epsilon and tf.clip_by_value is doing?"、
的回答它确保 output != 0，因为 tf.log(0) returns 除以零错误。
（我没有要评论的要点，但我想我会做出贡献）

categorical_crossentropy 在keras中是如何实现的？

How is the categorical_crossentropy implemented in keras?

python

keras

tensorflow

softmax

loss-function