Keras 损失:nan 在更长的 lstm 回顾中
Keras loss: nan on longer lstm lookbacks
我正在尝试在滑动 window 上使用 lstm 拟合序列。当我选择短 windows(即长度 16)时,这通常工作正常,但如果我增加它(即 128),损失函数将变为 nan。在这两者之间,有时需要几个 epoch 才能跳过 nan,直到发生这种情况,损失看起来很正常,就像向下走一样。
batch_size = 16
look_back = 64
neurons = 200
model = Sequential()
model.add(LSTM(neurons, activation='relu', input_shape=(look_back, 1), stateful=False, return_sequences=False))
model.add(RepeatVector(1))
model.add(LSTM(neurons, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(100, activation='relu')))
model.add(Dense(1, activation='relu'))
opt = tf.keras.optimizers.Adam(learning_rate=0.0001)
model.compile(loss='mse', optimizer=opt)
for i in range(5):
print("Epoch: ", i+1)
model.fit(trainX[:,:,0:1], trainY, epochs=1, batch_size=batch_size, verbose=2, shuffle=True)
“好几个时代然后 nan”案例的示例输出:
Epoch: 1
209/209 - 7s - loss: 36917764.0000
Epoch: 2
209/209 - 4s - loss: 19063908.0000
Epoch: 3
209/209 - 4s - loss: 18515792.0000
Epoch: 4
209/209 - 4s - loss: 17864662.0000
Epoch: 5
209/209 - 4s - loss: 16512718.0000
Epoch: 6
209/209 - 4s - loss: nan
如果有帮助,我从我的数据中添加了一些数字:
[ 9594, 10747, 8220, 9163, 16074, 8213, 14740, 15809, 12513,
14814, 8208, 15961, 10893, 13283, 14807, 8119, 12667, 11131,
12904, 15882, 8971, 7129, 10744, 17505, 9050, 8592, 4318,
18266, 15951, 16162, 4242, 5157, 7882, 14119, 2265, 5868,
16123, 7904, 10662, 13519, 8903, 8068, 7828, 3213, 14888,
23663, 8522, 14963, 8304, 11046, 9972, 19193, 7587, 11668,
7898, 11682, 14950, 5196, 19456, 287, 13887, 10674, 12437,
10740, 16827, 12054, 3617, 14235, 23124, 4781, 14021, 13468,
14170, 13189, 8370, 7129, 8988, 7445, 11430, 18196, 14355,
3954, 17600, 17026, 16390, 16959, 11966, 8519, 13435, 17974,
9355, 17052, 11744, 4859, 16085, 20042, 5729, 17748, 9527,
7438, 14347, 6874, 2329, 17259, 16964, 10768, 15212, 13381,
8910, 3514, 4117, 15279, 13037, 1081, 12532, 12044, 13742,
12286, 19194, 8590, 10049, 8129, 3537, 15993, 11127, 8771,
22610, 9671]
看来我遇到了严重的梯度爆炸问题。将它们裁剪为 1 解决了问题
opt = tf.keras.optimizers.Adam(learning_rate=0.001, clipnorm=1.)
我正在尝试在滑动 window 上使用 lstm 拟合序列。当我选择短 windows(即长度 16)时,这通常工作正常,但如果我增加它(即 128),损失函数将变为 nan。在这两者之间,有时需要几个 epoch 才能跳过 nan,直到发生这种情况,损失看起来很正常,就像向下走一样。
batch_size = 16
look_back = 64
neurons = 200
model = Sequential()
model.add(LSTM(neurons, activation='relu', input_shape=(look_back, 1), stateful=False, return_sequences=False))
model.add(RepeatVector(1))
model.add(LSTM(neurons, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(100, activation='relu')))
model.add(Dense(1, activation='relu'))
opt = tf.keras.optimizers.Adam(learning_rate=0.0001)
model.compile(loss='mse', optimizer=opt)
for i in range(5):
print("Epoch: ", i+1)
model.fit(trainX[:,:,0:1], trainY, epochs=1, batch_size=batch_size, verbose=2, shuffle=True)
“好几个时代然后 nan”案例的示例输出:
Epoch: 1
209/209 - 7s - loss: 36917764.0000
Epoch: 2
209/209 - 4s - loss: 19063908.0000
Epoch: 3
209/209 - 4s - loss: 18515792.0000
Epoch: 4
209/209 - 4s - loss: 17864662.0000
Epoch: 5
209/209 - 4s - loss: 16512718.0000
Epoch: 6
209/209 - 4s - loss: nan
如果有帮助,我从我的数据中添加了一些数字:
[ 9594, 10747, 8220, 9163, 16074, 8213, 14740, 15809, 12513,
14814, 8208, 15961, 10893, 13283, 14807, 8119, 12667, 11131,
12904, 15882, 8971, 7129, 10744, 17505, 9050, 8592, 4318,
18266, 15951, 16162, 4242, 5157, 7882, 14119, 2265, 5868,
16123, 7904, 10662, 13519, 8903, 8068, 7828, 3213, 14888,
23663, 8522, 14963, 8304, 11046, 9972, 19193, 7587, 11668,
7898, 11682, 14950, 5196, 19456, 287, 13887, 10674, 12437,
10740, 16827, 12054, 3617, 14235, 23124, 4781, 14021, 13468,
14170, 13189, 8370, 7129, 8988, 7445, 11430, 18196, 14355,
3954, 17600, 17026, 16390, 16959, 11966, 8519, 13435, 17974,
9355, 17052, 11744, 4859, 16085, 20042, 5729, 17748, 9527,
7438, 14347, 6874, 2329, 17259, 16964, 10768, 15212, 13381,
8910, 3514, 4117, 15279, 13037, 1081, 12532, 12044, 13742,
12286, 19194, 8590, 10049, 8129, 3537, 15993, 11127, 8771,
22610, 9671]
看来我遇到了严重的梯度爆炸问题。将它们裁剪为 1 解决了问题
opt = tf.keras.optimizers.Adam(learning_rate=0.001, clipnorm=1.)