TensorFlow BinaryCrossentropy 损失很快达到 NaN

TensorFlow BinaryCrossentropy loss quickly reaches NaN

TL;DR - ML 模型损失,当用新数据重新训练时,很快达到 NaN。所有“标准”解决方案都不起作用。

您好,

最近,我(成功)训练了一个 CNN/dense-layered 模型,能够对频谱图(音频的图像表示)进行分类。我想再次尝试训练这个模型使用新数据并确保它是正确的尺寸等。

然而,由于某种原因,BinaryCrossentropy 损失函数稳步下降,直到 1.000 左右,并在第一个 epoch 内突然变为“NaN”。我试过将学习率降低到 1e-8,在整个过程中使用 ReLu,在最后一层使用 sigmoid,但似乎没有任何效果。即使将网络简化为只有密集层,这个问题仍然会发生。虽然我已经手动标准化了我的数据,但我非常有信心我做对了,所以我的所有数据都在 [0, 1] 之间。这里可能有一个洞,但我认为这不太可能。

我在这里附上了我的模型架构代码:

input_shape = (125, 128, 1)

model = models.Sequential([
    
    layers.Conv2D(16, (3, 3), activation='relu', kernel_regularizer=regularizers.l2(0.001), input_shape=input_shape),
    layers.MaxPooling2D((3, 3), strides=(2, 2), padding='same'),
    layers.BatchNormalization(),

    layers.Conv2D(16, (3, 3), activation='relu', kernel_regularizer=regularizers.l2(0.001)),
    layers.MaxPooling2D((2, 2), strides=(1, 1), padding='same'),
    layers.BatchNormalization(),
    
    layers.Conv2D(16, (3, 3), activation='relu', kernel_regularizer=regularizers.l2(0.001)),
    layers.MaxPooling2D((3, 3), strides=(2, 2), padding='same'),
    layers.BatchNormalization(),
    
    layers.Conv2D(16, (2, 2), activation='relu', kernel_regularizer=regularizers.l2(0.001)),
    layers.MaxPooling2D((2, 2), strides=(1, 1), padding='same'),
    layers.BatchNormalization(),
    
    layers.Conv2D(16, (2, 2), activation='relu', kernel_regularizer=regularizers.l2(0.001)),
    layers.MaxPooling2D((3, 3), strides=(2, 2), padding='same'),
    layers.BatchNormalization(),
    
    layers.Conv2D(16, (2, 2), activation='relu', kernel_regularizer=regularizers.l2(0.001)),
    layers.MaxPooling2D((2, 2), strides=(1, 1), padding='same'),
    layers.BatchNormalization(),
    
    layers.Dropout(0.3),
    
    layers.Flatten(),
    layers.Dense(512, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
    layers.Dense(256, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
    layers.Dense(128, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
    layers.Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
    layers.Dropout(0.5),
    layers.Dense(1, activation='sigmoid')
    
])

不过有趣的是,我尝试使用这些新数据来微调 VGG16 模型,并且成功了! (没有丢失 NaN 的问题。)我已经在此处附加了该代码,但我真的不知道 where/if 导致问题的原因有什么不同:

base_model = keras.applications.VGG16(
    weights="imagenet", 
    input_shape=(125, 128, 3),
    include_top=False,
) 

# Freeze the base_model
base_model.trainable = False

# Create new model on top
inputs = keras.Input(shape=(125, 128, 3))
x = inputs
x = base_model(x, training=False)
x = keras.layers.GlobalAveragePooling2D()(x)
x = keras.layers.Dense(256, activation='relu', kernel_regularizer=regularizers.l2(0.001))(x)
x = keras.layers.Dense(128, activation='relu', kernel_regularizer=regularizers.l2(0.001))(x)
x = keras.layers.Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.001))(x)
x = keras.layers.Dropout(0.5)(x)  # Regularize with dropout
outputs = keras.layers.Dense(1, activation='sigmoid')(x)
model = keras.Model(inputs, outputs)

model.summary()

我想我已经阅读了所有“书本”的解决方案,但似乎仍然找不到问题的根源。任何帮助将不胜感激。

Convolution layers 中删除所有不需要的 kernel_regularizersBatchNormalizationdropout 层。
仅在模型定义的 Dense 层中保留 kernel_regularizersDropout,并更改 Conv2d 层中的 number of kernels

并尝试使用以下代码再次训练您的模型:

model = Sequential([
    Rescaling(1./255, input_shape=(img_h,img_w,3)),
    Conv2D(16, (3, 3), activation='relu'),#, kernel_regularizer=regularizers.l2(0.001)),
    MaxPooling2D((3, 3), strides=(2, 2), padding='same'),
    #BatchNormalization(),

    Conv2D(16, (3, 3), activation='relu'),#, kernel_regularizer=regularizers.l2(0.001)),
    MaxPooling2D((2, 2), strides=(1, 1), padding='same'),
    #BatchNormalization(),
    
    Conv2D(32, (3, 3), activation='relu'),#, kernel_regularizer=regularizers.l2(0.001)),
    MaxPooling2D((3, 3), strides=(2, 2), padding='same'),
    #BatchNormalization(),
    
    Conv2D(32, (2, 2), activation='relu'),#, kernel_regularizer=regularizers.l2(0.001)),
    MaxPooling2D((2, 2), strides=(1, 1), padding='same'),
    #BatchNormalization(),
    
    Conv2D(64, (2, 2), activation='relu'),#, kernel_regularizer=regularizers.l2(0.001)),
    MaxPooling2D((3, 3), strides=(2, 2), padding='same'),
    #BatchNormalization(),
    
    Conv2D(64, (2, 2), activation='relu'),#, kernel_regularizer=regularizers.l2(0.001)),
    MaxPooling2D((2, 2), strides=(1, 1), padding='same'),
    #BatchNormalization(),
    
    #Dropout(0.3),
    
    Flatten(),
    Dense(512, activation='relu'), kernel_regularizer=regularizers.l2(0.001)),
    Dense(256, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
    Dense(128, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
    Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
    Dropout(0.5),
    Dense(1, activation='sigmoid')
    
])

optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
model.compile(optimizer=optimizer,loss='binary_crossentropy',metrics=['accuracy'])


model.fit(...

输出:

Epoch 1/20
63/63 [==============================] - 9s 97ms/step - loss: 1.0032 - accuracy: 0.5035 - val_loss: 0.8219 - val_accuracy: 0.6160
Epoch 2/20
63/63 [==============================] - 6s 88ms/step - loss: 0.7575 - accuracy: 0.5755 - val_loss: 0.7256 - val_accuracy: 0.6120
Epoch 3/20
63/63 [==============================] - 6s 88ms/step - loss: 0.7181 - accuracy: 0.5805 - val_loss: 0.6917 - val_accuracy: 0.6360
Epoch 4/20
63/63 [==============================] - 6s 88ms/step - loss: 0.6749 - accuracy: 0.6190 - val_loss: 0.6671 - val_accuracy: 0.6300
Epoch 5/20
63/63 [==============================] - 6s 95ms/step - loss: 0.6571 - accuracy: 0.6500 - val_loss: 0.6850 - val_accuracy: 0.5980
Epoch 6/20
63/63 [==============================] - 5s 80ms/step - loss: 0.6319 - accuracy: 0.6720 - val_loss: 0.6243 - val_accuracy: 0.6730
Epoch 7/20
63/63 [==============================] - 6s 90ms/step - loss: 0.5923 - accuracy: 0.6935 - val_loss: 0.6144 - val_accuracy: 0.7120
Epoch 8/20
63/63 [==============================] - 6s 89ms/step - loss: 0.5643 - accuracy: 0.7205 - val_loss: 0.6136 - val_accuracy: 0.6700
Epoch 9/20
63/63 [==============================] - 6s 93ms/step - loss: 0.5552 - accuracy: 0.7380 - val_loss: 0.5669 - val_accuracy: 0.7080
Epoch 10/20
63/63 [==============================] - 4s 58ms/step - loss: 0.5423 - accuracy: 0.7400 - val_loss: 0.5819 - val_accuracy: 0.7120
Epoch 11/20
63/63 [==============================] - 4s 57ms/step - loss: 0.4905 - accuracy: 0.7745 - val_loss: 0.6146 - val_accuracy: 0.7020
Epoch 12/20
63/63 [==============================] - 4s 57ms/step - loss: 0.4808 - accuracy: 0.7900 - val_loss: 0.6318 - val_accuracy: 0.7070
Epoch 13/20
63/63 [==============================] - 4s 60ms/step - loss: 0.4602 - accuracy: 0.7990 - val_loss: 0.5707 - val_accuracy: 0.7160
Epoch 14/20
63/63 [==============================] - 4s 61ms/step - loss: 0.4291 - accuracy: 0.8190 - val_loss: 0.6392 - val_accuracy: 0.6910
Epoch 15/20
63/63 [==============================] - 5s 69ms/step - loss: 0.4003 - accuracy: 0.8355 - val_loss: 0.7048 - val_accuracy: 0.7110
Epoch 16/20
63/63 [==============================] - 4s 58ms/step - loss: 0.3658 - accuracy: 0.8430 - val_loss: 0.8027 - val_accuracy: 0.7180
Epoch 17/20
63/63 [==============================] - 4s 58ms/step - loss: 0.3069 - accuracy: 0.8750 - val_loss: 0.9428 - val_accuracy: 0.6970
Epoch 18/20
63/63 [==============================] - 4s 59ms/step - loss: 0.2601 - accuracy: 0.9005 - val_loss: 0.9420 - val_accuracy: 0.7170
Epoch 19/20
63/63 [==============================] - 4s 60ms/step - loss: 0.2061 - accuracy: 0.9230 - val_loss: 0.9134 - val_accuracy: 0.7290
Epoch 20/20
63/63 [==============================] - 4s 62ms/step - loss: 0.1770 - accuracy: 0.9330 - val_loss: 1.0805 - val_accuracy: 0.6930

原来这是我的一些输入数据的问题(在归一化过程中除以零错误。)很抱歉给您带来麻烦,感谢您的帮助。