Keras：binary_crossentropy & categorical_crossentropy 混乱

Question

在使用 TensorFlow 一段时间后，我阅读了一些 Keras 教程并实现了一些示例。我找到了几个使用 keras.losses.binary_crossentropy 作为损失函数的卷积自动编码器的教程。

我认为 binary_crossentropy 应该 而不是 是一个多重 class 损失函数并且很可能会使用二进制标签，但实际上 Keras (TF Python 后端）调用 tf.nn.sigmoid_cross_entropy_with_logits，它实际上是用于 class 具有多个独立 class 的化任务，这些不是互斥.

另一方面，我对 categorical_crossentropy 的期望是针对多 class class 化，其中目标 classes 有相互依赖，但不一定是one-hot编码。

但是，Keras 文档指出：

(...) when using the categorical_crossentropy loss, your targets should be in categorical format (e.g. if you have 10 classes, the target for each sample should be a 10-dimensional vector that is all-zeros expect for a 1 at the index corresponding to the class of the sample).

如果我没记错的话，这只是单热编码class化任务的特例，但潜在的交叉熵损失也适用于概率分布（"multi-class"，依赖标签）？

此外，Keras 使用 tf.nn.softmax_cross_entropy_with_logits（TF python 后端）实现，它本身 states:

NOTE: While the classes are mutually exclusive, their probabilities need not be. All that is required is that each row of labels is a valid probability distribution. If they are not, the computation of the gradient will be incorrect.

如果我错了请纠正我，但在我看来 Keras 文档 - 至少 - 不是很 "detailed"?!

那么，Keras 对损失函数的命名背后的想法是什么？文档是否正确？如果二元交叉熵真的依赖于二元标签，它应该不适用于自动编码器，对吧？！同样，分类交叉熵：如果文档正确，应该只适用于单热编码标签？！

Answer 1

不确定这是否回答了您的问题，但对于 softmax 损失，输出层需要是概率分布（即总和为 1），而对于二元交叉熵损失则不是。就那么简单。（二进制不是说只有2个输出类，只是说每个输出都是二进制的。）

Answer 2

您通过定义适用于这些损失的区域是正确的：

binary_crossentropy（和引擎盖下的 tf.nn.sigmoid_cross_entropy_with_logits）用于 二进制多标签 class 化（标签是独立的）。
categorical_crossentropy（和引擎盖下的 tf.nn.softmax_cross_entropy_with_logits）用于 multi-class class化（classes 是独占的）。

另请参阅中的详细分析。

我不确定你指的是什么教程，所以无法评论 binary_crossentropy 是自动编码器的好还是坏选择。

至于命名，绝对正确合理。还是您认为 sigmoid 和 softmax 名字听起来更好？

所以您的问题中唯一令人困惑的是 categorical_crossentropy 文档。请注意，所陈述的一切都是正确的：损失支持单热表示。在张量流后端的情况下，此函数 indeed 适用于标签的任何概率分布（除了单热向量）并且它可以包含在文档，但这对我来说并不重要。此外，需要检查其他后端，theano 和 CNTK 是否支持 soft classes。请记住，keras 试图做到极简主义，并以最流行的用例为目标，所以我可以理解这里的逻辑。

Answer 3

文档没有提到 BinaryCrossentropy 可用于多标签分类，这可能会造成混淆。但它也可以用于二元分类器（当我们只有 2 个排他性类，如猫和狗时）——参见经典 example。但是在这种情况下我们必须设置 n_classes=1:

tf.keras.layers.Dense(units=1)

BinaryCrossentropy 和 tf.keras.losses.binary_crossentropy 也有不同的行为。

我们看文档中的例子来证明它其实是针对多标签分类的

y_true = tf.convert_to_tensor([[0, 1], [0, 0]])
y_pred = tf.convert_to_tensor([[0.6, 0.4], [0.4, 0.6]])

bce = tf.keras.losses.BinaryCrossentropy()
loss1 = bce(y_true=y_true, y_pred=y_pred)
# <tf.Tensor: shape=(), dtype=float32, numpy=0.81492424>

loss2 = tf.keras.losses.binary_crossentropy(y_true, y_pred)
# <tf.Tensor: shape=(2,), dtype=float32, numpy=array([0.9162905 , 0.71355796], dtype=float32)>

np.mean(loss2.numpy())
# 0.81492424

scce = tf.keras.losses.SparseCategoricalCrossentropy()
y_true = tf.convert_to_tensor([0, 0])
scce(y_true, y_pred)
# <tf.Tensor: shape=(), dtype=float32, numpy=0.71355814>
y_true = tf.convert_to_tensor([1, 0])
scce(y_true, y_pred)
# <tf.Tensor: shape=(), dtype=float32, numpy=0.9162907>

Keras：binary_crossentropy & categorical_crossentropy 混乱

Keras: binary_crossentropy & categorical_crossentropy confusion

python

classification

keras

tensorflow

cross-entropy