Keras BinaryCrossentropy 损失为两个向量之间的 angular 距离给出 NaN

Question

我想训练一个 siamese-LSTM，如果相应标签为 0，则两个输出的 angular 距离为 1（低相似度），如果标签为 1，则为 0（高相似度）。

我从这里计算了 angular 距离的公式：https://en.wikipedia.org/wiki/Cosine_similarity

这是我的模型代码：

# inputs are unicode encoded int arrays from strings
# similar string should yield low angular distance
left_input = tf.keras.layers.Input(shape=[None, 1], dtype='float32')
right_input = tf.keras.layers.Input(shape=[None, 1], dtype='float32')
lstm = tf.keras.layers.LSTM(10)
left_embedding = lstm(left_input)
right_embedding = lstm(right_input)
# cosine_layer is the operation to get cosine similarity
cosine_layer = tf.keras.layers.Dot(axes=1, normalize=True)
cosine_similarity = cosine_layer([left_embedding, right_embedding])
# next two lines calculate angular distance but with inversed labels
arccos = tf.math.acos(cosine_similarity)
angular_distance = arccos / math.pi # not 1. - (arccos / math.pi)
model = tf.keras.Model([left_input, right_input], [angular_distance])
model.compile(loss='binary_crossentropy', optimizer='sgd')
print(model.summary())

模型摘要对我来说看起来不错，而且在使用固定输入值进行测试时，我得到了余弦相似度等的正确值：

Model: "model_37"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_95 (InputLayer)           [(None, None, 1)]    0                                            
__________________________________________________________________________________________________
input_96 (InputLayer)           [(None, None, 1)]    0                                            
__________________________________________________________________________________________________
lstm_47 (LSTM)                  (None, 10)           480         input_95[0][0]                   
                                                                 input_96[0][0]                   
__________________________________________________________________________________________________
dot_47 (Dot)                    (None, 1)            0           lstm_47[0][0]                    
                                                                 lstm_47[1][0]                    
__________________________________________________________________________________________________
tf_op_layer_Acos_52 (TensorFlow [(None, 1)]          0           dot_47[0][0]                     
__________________________________________________________________________________________________
tf_op_layer_truediv_37 (TensorF [(None, 1)]          0           tf_op_layer_Acos_52[0][0]        
__________________________________________________________________________________________________
tf_op_layer_sub_20 (TensorFlowO [(None, 1)]          0           tf_op_layer_truediv_37[0][0]     
__________________________________________________________________________________________________
tf_op_layer_sub_21 (TensorFlowO [(None, 1)]          0           tf_op_layer_sub_20[0][0]         
__________________________________________________________________________________________________
tf_op_layer_Abs (TensorFlowOpLa [(None, 1)]          0           tf_op_layer_sub_21[0][0]         
==================================================================================================
Total params: 480
Trainable params: 480
Non-trainable params: 0
__________________________________________________________________________________________________
None

但是在训练时我总是丢失 NaN

model.fit([np.array(x_left_train), np.array(x_right_train)], np.array(y_train).reshape((-1,1)), batch_size=1, epochs=2, validation_split=0.1)

Train on 14400 samples, validate on 1600 samples
Epoch 1/2
  673/14400 [>.............................] - ETA: 5:42 - loss: nan

这不是获得两个向量之间的相似性并训练我的网络生成这些向量的正确方法吗？

Answer 1

二元交叉熵计算log(output)和log(1-output)。这意味着您的输出需要严格大于 0 且严格小于 1，否则您将计算负数的 log，结果为 NaN。（注意：log(0) 应该给你 -inf，没有 NaN 差，但仍然不理想）

从数学上讲，你的输出应该在正确的区间内，但由于浮点运算的不准确性，我很能想象这是你的问题。不过，这只是猜测。

因此，尝试强制输出大于 0 且小于 1，例如通过使用 clip 和一个小的 epsilon:

angular_distance = tf.keras.backend.clip(angular_distance, 1e-6, 1 - 1e-6)

Keras BinaryCrossentropy 损失为两个向量之间的 angular 距离给出 NaN

Keras BinaryCrossentropy loss gives NaN for angular distance between two vectors

cosine-similarity

keras

tensorflow

cross-entropy