解释 LSTM 张量流中的损失
Interpreting loss in LSTM tensorflow
我正在使用 PTB 数据集来预测下一个单词。
我的代码:pastebin link.
模型的输入 (Batch_input) 是 vocabulary_size 为 10000 的词。所有输出 (Batch_labels) 都是单热编码的,你可以在输出部分看到一个样本下面的代码。
以下是我训练 LSTM 模型后的输出。
输出:pastebin link.
以下是部分输出:
Initialized
('Loss :', 9.2027139663696289)
('Batch_input :', array([9971, 9972, 9974, 9975, 9976, 9980, 9981, 9982, 9983, 9984, 9986,
9987, 9988, 9989, 9991, 9992, 9993, 9994, 9995, 9996, 9997, 9998,
9999, 2, 9256, 1, 3, 72, 393, 33, 2133, 0, 146,
19, 6, 9207, 276, 407, 3, 2, 23, 1, 13, 141,
4, 1, 5465, 0, 3081, 1596, 96, 2, 7682, 1, 3,
72, 393, 8, 337, 141, 4, 2477, 657, 2170], dtype=int32))
('Batch_labels :', array([[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.]], dtype=float32))
Average loss at step 0: 0.092027 learning rate: 1.000000
('Label: ', array([[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.]], dtype=float32))
('Predicted:', array([[-0.36508381, -0.25612 , -0.26035795, ..., -0.42688274,
-0.4078168 , -0.36345699],
[-0.46035308, -0.27282876, -0.34078932, ..., -0.50623679,
-0.47014061, -0.43237451],
[-0.14694197, -0.07506246, -0.10392818, ..., -0.1128526 ,
-0.12404554, -0.13495158],
...,
[-0.07286638, -0.04560997, -0.05932444, ..., -0.08352474,
-0.07679331, -0.07829094],
[-0.13576414, -0.07057529, -0.1017022 , ..., -0.11192483,
-0.14713599, -0.11757012],
[-0.05446544, -0.02738103, -0.03401792, ..., -0.05073205,
-0.03746928, -0.05750648]], dtype=float32))
================================================================================
[[ 0. 0. 0. ..., 0. 0. 0.]]
8605
('f', u'altman')
('as', u'altman')
('feed', array([8605]))
('Sentence :', u'altman rake years regatta memotec pierre <unk> nonexecutive as will <eos> ssangyong director nahb group the cluett rubens snack-food fromstein calloway and memotec a board years regatta publishing fields rake group group rake cluett ssangyong pierre calloway memotec gitano gold rubens as as director sim is publishing gitano punts join <unk> and a old punts years memotec a rake is guterman cluett ssangyong will berlitz nahb <eos> of group join <unk> board join and pierre consolidated board cluett dutch gold as ipo ssangyong guterman a kia will dutch and director centrust consolidated rudolph guterman guterman cluett years n.v. old board rubens ')
================================================================================
('Loss :', 496.78199882507323)
('Batch_input :', array([4115, 5, 14, 45, 55, 3, 72, 195, 1244, 220, 2,
0, 3150, 7426, 1, 13, 4052, 1, 496, 14, 6885, 0,
1, 22, 113, 2652, 8068, 5, 14, 2474, 5250, 10, 464,
52, 3004, 466, 1244, 15, 2, 1, 80, 0, 167, 4,
35, 2645, 1, 65, 10, 558, 6092, 3574, 1898, 666, 1,
7, 27, 1, 4241, 6036, 7, 3, 2, 366], dtype=int32))
('Batch_labels :', array([[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
...,
[ 0., 0., 1., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.]], dtype=float32))
Average loss at step 100: 4.967820 learning rate: 1.000000
('Label: ', array([[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
...,
[ 0., 0., 1., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.]], dtype=float32))
('Predicted:', array([[ 4.41551352e+00, 9.98007679e+00, 1.75690575e+01, ...,
6.83443546e+00, -2.30797195e+00, 1.73750782e+00],
[ 1.26826172e+01, 5.96618652e-03, 1.18247871e+01, ...,
-3.70885038e+00, -8.55356884e+00, -9.16959190e+00],
[ 1.44652233e+01, 5.12977028e+00, 9.42045784e+00, ...,
1.39444172e+00, 1.95213389e+00, -4.00810099e+00],
...,
[ 2.93052626e+00, 9.41266441e+00, 1.79130135e+01, ...,
4.24245834e+00, -1.46551771e+01, -3.35697136e+01],
[ 2.48945675e+01, 2.32091904e+01, 2.47276134e+01, ...,
-6.39845896e+00, -2.66628218e+00, -4.59843445e+00],
[ 1.34414902e+01, 4.80197811e+00, 1.89214745e+01, ...,
-5.91268682e+00, -8.80736637e+00, -6.49542713e+00]], dtype=float32))
================================================================================
[[ 0. 0. 0. ..., 0. 0. 0.]]
3619
('f', u'officially')
('as', u'officially')
('feed', array([3619]))
('Sentence :', u'officially <unk> to <eos> filters ago cigarettes is that cigarette stopped to <eos> researchers <unk> to <eos> filters ago cigarettes asbestos the filters ago cigarettes asbestos the filters ago cigarettes is that cigarette up the <eos> researchers to <eos> researchers <unk> to <eos> filters ago cigarettes asbestos the filters ago cigarettes asbestos <eos> filters ago cigarettes asbestos the filters ago cigarettes is that cigarette up the <eos> researchers <unk> to <eos> researchers <unk> to <eos> filters ago cigarettes asbestos of percentage years the the the <eos> researchers <unk> to <eos> filters ago cigarettes asbestos the filters ago cigarettes asbestos the filters ')
================================================================================
初始损失为 0.92,预测文本为 given.The下一个损失在 100 步时约为 4.57。但是作为 number of step increases loss increases
这是异常的(对吧?)。
以及输出中的下一个预测词 'among' repeats at step 500
。
训练有没有错误?
这是我得到的新输出:pastebin link.
我不是 100% 确定你的代码中的问题,但我注意到你的学习率从 1 开始。
learning_rate = tf.train.exponential_decay(1.0, global_step, 5000, 0.1, staircase=True)
尝试选择一个较低的初始值。
高学习率会导致模型权重大幅跃升,因此它可能会错过最小值,甚至可能达到损失更高的点(这可能是您的情况)。这就像从山谷的一侧到另一侧的超级跳跃,而不是深入其中。
图片参考:http://cs231n.github.io/neural-networks-3/
将学习率从 1e-2 降低到 1e-4 解决了不同模型中的类似问题。您的模型可能以不同的学习率工作。
我正在使用 PTB 数据集来预测下一个单词。
我的代码:pastebin link.
模型的输入 (Batch_input) 是 vocabulary_size 为 10000 的词。所有输出 (Batch_labels) 都是单热编码的,你可以在输出部分看到一个样本下面的代码。
以下是我训练 LSTM 模型后的输出。
输出:pastebin link.
以下是部分输出:
Initialized
('Loss :', 9.2027139663696289)
('Batch_input :', array([9971, 9972, 9974, 9975, 9976, 9980, 9981, 9982, 9983, 9984, 9986,
9987, 9988, 9989, 9991, 9992, 9993, 9994, 9995, 9996, 9997, 9998,
9999, 2, 9256, 1, 3, 72, 393, 33, 2133, 0, 146,
19, 6, 9207, 276, 407, 3, 2, 23, 1, 13, 141,
4, 1, 5465, 0, 3081, 1596, 96, 2, 7682, 1, 3,
72, 393, 8, 337, 141, 4, 2477, 657, 2170], dtype=int32))
('Batch_labels :', array([[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.]], dtype=float32))
Average loss at step 0: 0.092027 learning rate: 1.000000
('Label: ', array([[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.]], dtype=float32))
('Predicted:', array([[-0.36508381, -0.25612 , -0.26035795, ..., -0.42688274,
-0.4078168 , -0.36345699],
[-0.46035308, -0.27282876, -0.34078932, ..., -0.50623679,
-0.47014061, -0.43237451],
[-0.14694197, -0.07506246, -0.10392818, ..., -0.1128526 ,
-0.12404554, -0.13495158],
...,
[-0.07286638, -0.04560997, -0.05932444, ..., -0.08352474,
-0.07679331, -0.07829094],
[-0.13576414, -0.07057529, -0.1017022 , ..., -0.11192483,
-0.14713599, -0.11757012],
[-0.05446544, -0.02738103, -0.03401792, ..., -0.05073205,
-0.03746928, -0.05750648]], dtype=float32))
================================================================================
[[ 0. 0. 0. ..., 0. 0. 0.]]
8605
('f', u'altman')
('as', u'altman')
('feed', array([8605]))
('Sentence :', u'altman rake years regatta memotec pierre <unk> nonexecutive as will <eos> ssangyong director nahb group the cluett rubens snack-food fromstein calloway and memotec a board years regatta publishing fields rake group group rake cluett ssangyong pierre calloway memotec gitano gold rubens as as director sim is publishing gitano punts join <unk> and a old punts years memotec a rake is guterman cluett ssangyong will berlitz nahb <eos> of group join <unk> board join and pierre consolidated board cluett dutch gold as ipo ssangyong guterman a kia will dutch and director centrust consolidated rudolph guterman guterman cluett years n.v. old board rubens ')
================================================================================
('Loss :', 496.78199882507323)
('Batch_input :', array([4115, 5, 14, 45, 55, 3, 72, 195, 1244, 220, 2,
0, 3150, 7426, 1, 13, 4052, 1, 496, 14, 6885, 0,
1, 22, 113, 2652, 8068, 5, 14, 2474, 5250, 10, 464,
52, 3004, 466, 1244, 15, 2, 1, 80, 0, 167, 4,
35, 2645, 1, 65, 10, 558, 6092, 3574, 1898, 666, 1,
7, 27, 1, 4241, 6036, 7, 3, 2, 366], dtype=int32))
('Batch_labels :', array([[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
...,
[ 0., 0., 1., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.]], dtype=float32))
Average loss at step 100: 4.967820 learning rate: 1.000000
('Label: ', array([[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
...,
[ 0., 0., 1., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.]], dtype=float32))
('Predicted:', array([[ 4.41551352e+00, 9.98007679e+00, 1.75690575e+01, ...,
6.83443546e+00, -2.30797195e+00, 1.73750782e+00],
[ 1.26826172e+01, 5.96618652e-03, 1.18247871e+01, ...,
-3.70885038e+00, -8.55356884e+00, -9.16959190e+00],
[ 1.44652233e+01, 5.12977028e+00, 9.42045784e+00, ...,
1.39444172e+00, 1.95213389e+00, -4.00810099e+00],
...,
[ 2.93052626e+00, 9.41266441e+00, 1.79130135e+01, ...,
4.24245834e+00, -1.46551771e+01, -3.35697136e+01],
[ 2.48945675e+01, 2.32091904e+01, 2.47276134e+01, ...,
-6.39845896e+00, -2.66628218e+00, -4.59843445e+00],
[ 1.34414902e+01, 4.80197811e+00, 1.89214745e+01, ...,
-5.91268682e+00, -8.80736637e+00, -6.49542713e+00]], dtype=float32))
================================================================================
[[ 0. 0. 0. ..., 0. 0. 0.]]
3619
('f', u'officially')
('as', u'officially')
('feed', array([3619]))
('Sentence :', u'officially <unk> to <eos> filters ago cigarettes is that cigarette stopped to <eos> researchers <unk> to <eos> filters ago cigarettes asbestos the filters ago cigarettes asbestos the filters ago cigarettes is that cigarette up the <eos> researchers to <eos> researchers <unk> to <eos> filters ago cigarettes asbestos the filters ago cigarettes asbestos <eos> filters ago cigarettes asbestos the filters ago cigarettes is that cigarette up the <eos> researchers <unk> to <eos> researchers <unk> to <eos> filters ago cigarettes asbestos of percentage years the the the <eos> researchers <unk> to <eos> filters ago cigarettes asbestos the filters ago cigarettes asbestos the filters ')
================================================================================
初始损失为 0.92,预测文本为 given.The下一个损失在 100 步时约为 4.57。但是作为 number of step increases loss increases
这是异常的(对吧?)。
以及输出中的下一个预测词 'among' repeats at step 500
。
训练有没有错误?
这是我得到的新输出:pastebin link.
我不是 100% 确定你的代码中的问题,但我注意到你的学习率从 1 开始。
learning_rate = tf.train.exponential_decay(1.0, global_step, 5000, 0.1, staircase=True)
尝试选择一个较低的初始值。
高学习率会导致模型权重大幅跃升,因此它可能会错过最小值,甚至可能达到损失更高的点(这可能是您的情况)。这就像从山谷的一侧到另一侧的超级跳跃,而不是深入其中。
图片参考:http://cs231n.github.io/neural-networks-3/
将学习率从 1e-2 降低到 1e-4 解决了不同模型中的类似问题。您的模型可能以不同的学习率工作。