loss(y, y) != 0(相同的标签和预测,非零损失)

loss(y, y) != 0 (same labels and predictions, non-zero loss)

import numpy as np
from tensorflow.keras.layers import Input, Dense, Flatten
from tensorflow.keras.models import Model  # TF 2.2.0

#%%#######################################################
ipt = Input(batch_shape=(128, 28, 28, 1))
x   = Flatten()(ipt)
out = Dense(10, activation='softmax')(x)
model = Model(ipt, out)
model.compile('adam', 'categorical_crossentropy')

#%%#######################################################
x = np.random.uniform(0, 1, model.input_shape)

pred = model(x, training=True)  # =False also works
loss = model.compiled_loss(pred, pred)
print(loss)

输出:

tf.Tensor(1.9904033, shape=(), dtype=float32)

怎么回事?

这只是因为 categorical_crossentropy 损失的工作原理。如果您尝试使用 [0,0,0,1,0,0,0,0,0,0],它是零。如果您将原始代码中的 categorical_crossentropy 更改为 mse,您也会得到零。

import numpy as np
import tensorflow as tf  # TF 2.2.0
from tensorflow.keras.layers import Input, Dense, Flatten
from tensorflow.keras.models import Model

ipt = Input(shape=(28, 28, 1))
x   = Flatten()(ipt)
out = Dense(10, activation='softmax')(x)
model = Model(ipt, out)
model.compile('adam', 'categorical_crossentropy')

label = tf.one_hot([5,3,2], depth=10)
# tf.Tensor(
# [[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
#  [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
#  [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]], shape=(3, 10), dtype=float32)
loss = model.compiled_loss(label, label)
print(loss) # tf.Tensor(1.1920929e-07, shape=(), dtype=float32)

编辑:

numpy categorical crossentropy 损失的实现将是:

import numpy as np 
def cce(y_label,y_pred):
    return np.sum(-y_label*np.log(y_pred))
x = np.random.uniform(0, 1, (10,))
print(cce(x,x)) # which yields values like 1.9904033

这说明了为什么它不为零,因为您将预测的 log 乘以标签并取负数。所以what's the deal的问题是:this is working as intended.