Keras 损失:nan 在更长的 lstm 回顾中

Keras loss: nan on longer lstm lookbacks

我正在尝试在滑动 window 上使用 lstm 拟合序列。当我选择短 windows(即长度 16)时,这通常工作正常,但如果我增加它(即 128),损失函数将变为 nan。在这两者之间,有时需要几个 epoch 才能跳过 nan,直到发生这种情况,损失看起来很正常,就像向下走一样。

batch_size = 16
look_back = 64
neurons = 200
model = Sequential()
model.add(LSTM(neurons, activation='relu', input_shape=(look_back, 1), stateful=False, return_sequences=False))
model.add(RepeatVector(1))
model.add(LSTM(neurons, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(100, activation='relu')))
model.add(Dense(1, activation='relu'))
opt = tf.keras.optimizers.Adam(learning_rate=0.0001)
model.compile(loss='mse', optimizer=opt)
for i in range(5):
    print("Epoch: ", i+1)
    model.fit(trainX[:,:,0:1], trainY, epochs=1, batch_size=batch_size, verbose=2, shuffle=True)

“好几个时代然后 nan”案例的示例输出:

Epoch:  1
209/209 - 7s - loss: 36917764.0000
Epoch:  2
209/209 - 4s - loss: 19063908.0000
Epoch:  3
209/209 - 4s - loss: 18515792.0000
Epoch:  4
209/209 - 4s - loss: 17864662.0000
Epoch:  5
209/209 - 4s - loss: 16512718.0000
Epoch:  6
209/209 - 4s - loss: nan

如果有帮助,我从我的数据中添加了一些数字:

[ 9594, 10747,  8220,  9163, 16074,  8213, 14740, 15809, 12513,
   14814,  8208, 15961, 10893, 13283, 14807,  8119, 12667, 11131,
   12904, 15882,  8971,  7129, 10744, 17505,  9050,  8592,  4318,
   18266, 15951, 16162,  4242,  5157,  7882, 14119,  2265,  5868,
   16123,  7904, 10662, 13519,  8903,  8068,  7828,  3213, 14888,
   23663,  8522, 14963,  8304, 11046,  9972, 19193,  7587, 11668,
    7898, 11682, 14950,  5196, 19456,   287, 13887, 10674, 12437,
   10740, 16827, 12054,  3617, 14235, 23124,  4781, 14021, 13468,
   14170, 13189,  8370,  7129,  8988,  7445, 11430, 18196, 14355,
    3954, 17600, 17026, 16390, 16959, 11966,  8519, 13435, 17974,
    9355, 17052, 11744,  4859, 16085, 20042,  5729, 17748,  9527,
    7438, 14347,  6874,  2329, 17259, 16964, 10768, 15212, 13381,
    8910,  3514,  4117, 15279, 13037,  1081, 12532, 12044, 13742,
   12286, 19194,  8590, 10049,  8129,  3537, 15993, 11127,  8771,
   22610,  9671]

看来我遇到了严重的梯度爆炸问题。将它们裁剪为 1 解决了问题

opt = tf.keras.optimizers.Adam(learning_rate=0.001, clipnorm=1.)