在 TensorFlow 中计算交叉熵

Calculating Cross Entropy in TensorFlow

我在计算张量流中的交叉熵时遇到了困难。特别是,我正在使用函数:

tf.nn.softmax_cross_entropy_with_logits()

使用看似简单的代码,我只能得到return一个零

import tensorflow as tf
import numpy as np

sess = tf.InteractiveSession()

a = tf.placeholder(tf.float32, shape =[None, 1])
b = tf.placeholder(tf.float32, shape = [None, 1])
sess.run(tf.global_variables_initializer())
c = tf.nn.softmax_cross_entropy_with_logits(
    logits=b, labels=a
).eval(feed_dict={b:np.array([[0.45]]), a:np.array([[0.2]])})
print c

returns

0

我对交叉熵的理解是这样的:

H(p,q) = p(x)*log(q(x))

其中 p(x) 是事件 x 的真实概率,q(x) 是事件 x 的预测概率。

如果输入 p(x) 和 q(x) 的任意两个数字,那么

0<p(x)<1 AND 0<q(x)<1

应该有一个非零交叉熵。我期望我错误地使用了 tensorflow。在此先感谢您的帮助。

正如他们所说,没有 "softmax" 就无法拼写 "softmax_cross_entropy_with_logits"。 [0.45]的Softmax为[1]log(1)0.

Measures the probability error in discrete classification tasks in which the classes are mutually exclusive (each entry is in exactly one class). For example, each CIFAR-10 image is labeled with one and only one label: an image can be a dog or a truck, but not both.

NOTE: While the classes are mutually exclusive, their probabilities need not be. All that is required is that each row of labels is a valid probability distribution. If they are not, the computation of the gradient will be incorrect.

If using exclusive labels (wherein one and only one class is true at a time), see sparse_softmax_cross_entropy_with_logits.

WARNING: This op expects unscaled logits, since it performs a softmax on logits internally for efficiency. Do not call this op with the output of softmax, as it will produce incorrect results.

logits and labels must have the same shape [batch_size, num_classes] and the same dtype (either float16, float32, or float64).

除了 Don 的回答 (+1), 可能会让您感兴趣,因为它给出了计算 TensorFlow 中的交叉熵的公式:

An alternative way to write:

xent = tf.nn.softmax_cross_entropy_with_logits(logits, labels)

...would be:

softmax = tf.nn.softmax(logits)
xent = -tf.reduce_sum(labels * tf.log(softmax), 1)

However, this alternative would be (i) less numerically stable (since the softmax may compute much larger values) and (ii) less efficient (since some redundant computation would happen in the backprop). For real uses, we recommend that you use tf.nn.softmax_cross_entropy_with_logits().

这是 Tensorflow 2.0 中的一个实现,以防将来其他人(可能是我)需要它。

@tf.function
def cross_entropy(x, y, epsilon = 1e-9):
    return -2 * tf.reduce_mean(y * tf.math.log(x + epsilon), -1) / tf.math.log(2.)

x = tf.constant([
    [1.0,0],
    [0.5,0.5],
    [.75,.25]
    ]
,dtype=tf.float32)

with tf.GradientTape() as tape:
    tape.watch(x)
    y = entropy(x, x)

tf.print(y)
tf.print(tape.gradient(y, x))

输出

[-0 1 0.811278105]
[[-1.44269502 29.8973541]
 [-0.442695022 -0.442695022]
 [-1.02765751 0.557305]]