sparse_softmax_cross_entropy_with_logits 和 softmax_cross_entropy_with_logits 有什么区别？

Question

我最近遇到了 tf.nn.sparse_softmax_cross_entropy_with_logits and I can not figure out what the difference is compared to tf.nn.softmax_cross_entropy_with_logits。

使用 sparse_softmax_cross_entropy_with_logits 时训练向量 y 必须是 one-hot encoded 的唯一区别是什么？

阅读 API，与 softmax_cross_entropy_with_logits 相比，我找不到任何其他区别。但是为什么我们需要额外的功能呢？

softmax_cross_entropy_with_logits 是否应该产生与 sparse_softmax_cross_entropy_with_logits 相同的结果，如果它提供了单热编码训练 data/vectors？

Answer 1

拥有两个不同的函数方便，因为它们产生相同的结果。

区别很简单：

对于 sparse_softmax_cross_entropy_with_logits，标签必须具有 [batch_size] 形状和数据类型 int32 或 int64。每个标签都是 [0, num_classes-1].
对于 softmax_cross_entropy_with_logits，标签必须具有 [batch_size、num_classes] 和数据类型 float32 或 float64。

softmax_cross_entropy_with_logits 中使用的标签是 sparse_softmax_cross_entropy_with_logits 中使用的标签的一个热门版本。

另一个微小的区别是，对于 sparse_softmax_cross_entropy_with_logits，你可以给 -1 作为标签，在这个标签上有损失 0。

Answer 2

我只想在已接受的答案中添加 2 个内容，您也可以在 TF 文档中找到这些内容。

第一个：

tf.nn.softmax_cross_entropy_with_logits

NOTE: While the classes are mutually exclusive, their probabilities need not be. All that is required is that each row of labels is a valid probability distribution. If they are not, the computation of the gradient will be incorrect.

第二个：

tf.nn.sparse_softmax_cross_entropy_with_logits

NOTE: For this operation, the probability of a given label is considered exclusive. That is, soft classes are not allowed, and the labels vector must provide a single specific index for the true class for each row of logits (each minibatch entry).

Answer 3

两个函数计算相同的结果并且sparse_softmax_cross_entropy_with_logits computes the cross entropy directly on the sparse labels instead of converting them with 。

您可以通过运行以下程序验证这一点：

import tensorflow as tf
from random import randint

dims = 8
pos  = randint(0, dims - 1)

logits = tf.random_uniform([dims], maxval=3, dtype=tf.float32)
labels = tf.one_hot(pos, dims)

res1 = tf.nn.softmax_cross_entropy_with_logits(       logits=logits, labels=labels)
res2 = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=tf.constant(pos))

with tf.Session() as sess:
    a, b = sess.run([res1, res2])
    print a, b
    print a == b

这里我创建了一个长度为 dims 的随机 logits 向量并生成单热编码标签（其中 pos 中的元素为 1，其他元素为 0）。

之后我计算了 softmax 和 sparse softmax 并比较了它们的输出。尝试重新运行几次以确保它始终产生相同的输出

sparse_softmax_cross_entropy_with_logits 和 softmax_cross_entropy_with_logits 有什么区别？

What's the difference between sparse_softmax_cross_entropy_with_logits and softmax_cross_entropy_with_logits?

neural-network

tensorflow

softmax

cross-entropy