对于大于 1 的批量大小，LSTM 网络损失为 nan

Question

我正在尝试使用 LSTM 网络分析 EEG 数据，我将数据分成 4 秒的片段，产生了大约 17000 个数据样本。为此，我在下面构建了以下网络：

def load_model():
        model = Sequential()
        model.add(LSTM(5,recurrent_dropout=0.1,activation="relu",input_shape=(data_length, number_of_channels),
                    return_sequences=True, kernel_regularizer=tf.keras.regularizers.l1_l2(l1=0.00001, l2=0.00001)))
        model.add(Dense(512, activation = 'relu'))
        model.add(Dense(512, activation = 'relu'))
        model.add(Dropout(0.2))
        model.add(Dense(units=1, activation="sigmoid"))
        model.compile(optimizer=Adam(learning_rate=0.00001,clipvalue=1.5), loss='binary_crossentropy',
                    metrics=['accuracy', F1_scores,Precision,Sensitivity,Specificity],run_eagerly=True)
        return model

在训练时，前几批的损失立即变为 nan。为避免这种情况，我尝试添加循环丢失、le/l2 正则化、裁剪梯度以及正常丢失。我还尝试更改学习率和批量大小的值。唯一有用的是经常性辍学率为 0.9 并且 l1 和 l2 分数较低 (0.00001)，我还必须将 LSTM 网络中的单元格数量从最初的 30 个减少到 5 个。有没有其他方法可以避免这样做造成的损失，而不会降低这么多特征并且对梯度有很高的惩罚？

我正在使用 Microsoft 提供的 tensorflow-directml，tensoflow 版本 1.15.1 和 keras 2.7.0。

Answer 1

问题已通过将 LSTM 层的内核初始化为较小的值来解决。这是通过更改以下行来完成的：

model.add(LSTM(5,recurrent_dropout=0.1,activation="relu",input_shape=(data_length, number_of_channels),
                    return_sequences=True, kernel_regularizer=tf.keras.regularizers.l1_l2(l1=0.00001, l2=0.00001)))

收件人：

model.add(LSTM(5,recurrent_dropout=0.2, kernel_initializer=tf.keras.initializers.RandomNormal(mean=0.0, stddev=0.00001, seed=7)
                    ,activation="relu",input_shape=(data_length, number_of_channels),return_sequences=True))

对于大于 1 的批量大小，LSTM 网络损失为 nan

LSTM network loss is nan for batch size bigger than one

python

lstm

keras

tensorflow

gradient-exploding