LSTM Tensorflow 中最终细胞状态和 RNN 输出之间的区别？

Question

我试图理解 Tensorflow 中的 LSTM，我正在使用 tf.nn.bidirectional_dynamic_rnn 进行简单分类，其中 return 两件事，一个是每个单元格的 final_result，第二个是隐藏状态只有最后一个单元格，现在我的困惑是，如果我要为下一个 fully_connected 层获取最终输出，那么它会花费太多时间和迭代（即使 10000 次迭代也不够）来减少损失，而如果我是为下一层获取最终状态输出，然后它仅在 500 次迭代中给出良好的结果：

我的分类数据是：

vocab_ = {'\xa0': 60, 'S': 26, 'W': 30, 'É': 62, 'Á': 61, 'ò': 75, 'ê': 71, 'õ': 77, 'ñ': 74, 'J': 17, 'o': 48, ',': 3, "'": 2, 'g': 40, 'Q': 24, 'ż': 87, 'B': 9, 'ç': 68, 'O': 22, 'N': 21, 'D': 11, 'd': 37, 'x': 57, 'q': 50, 'L': 19, 'z': 59, 'U': 28, 'F': 13, 'w': 56, 't': 53, 'h': 41, 'j': 43, '1': 6, 'r': 51, 'e': 38, 'K': 18, 'k': 44, 'ú': 80, 'a': 34, 'ü': 81, 'é': 70, 'I': 16, 'Y': 32, 'ì': 72, 'ó': 76, 'A': 8, 'c': 36, 'E': 12, 'i': 42, 'G': 14, 'à': 64, 'y': 58, 'V': 29, 'C': 10, 'X': 31, 'ä': 67, '0': 0, 'b': 35, 's': 52, '/': 5, 'n': 47, 'p': 49, 'ö': 78, 'ą': 82, ' ': 1, 'Ż': 86, 'l': 45, 'á': 65, 'ù': 79, ':': 7, 'u': 54, 'Z': 33, 'è': 69, 'Ś': 85, 'm': 46, '-': 4, 'ł': 83, 'T': 27, 'P': 23, 'ń': 84, 'R': 25, 'í': 73, 'ã': 66, 'ß': 63, 'v': 55, 'M': 20, 'H': 15, 'f': 39}


sequences=[[18, 41, 48, 54, 51, 58, 0, 0],[18, 41, 48, 54, 51, 58, 0, 0], [21, 34, 41, 34, 52, 0, 0, 0], [11, 34, 41, 38, 51, 0, 0, 0], [14, 38, 51, 40, 38, 52, 0, 0], [21, 34, 59, 34, 51, 42, 0, 0], [20, 34, 34, 45, 48, 54, 39, 0], [14, 38, 51, 40, 38, 52, 0, 0], [21, 34, 42, 39, 38, 41, 0, 0], [14, 54, 42, 51, 40, 54, 42, 52], [9, 34, 35, 34, 0, 0, 0, 0], [26, 34, 35, 35, 34, 40, 41, 0], [8, 53, 53, 42, 34, 0, 0, 0], [27, 34, 41, 34, 47, 0, 0, 0], [15, 34, 37, 37, 34, 37, 0, 0], [8, 52, 56, 34, 37, 0, 0, 0], [21, 34, 43, 43, 34, 51, 0, 0], [11, 34, 40, 41, 38, 51, 0, 0], [20, 34, 45, 48, 48, 39, 0, 0], [16, 52, 34, 0, 0, 0, 0, 0], [8, 52, 40, 41, 34, 51, 0, 0], [21, 34, 37, 38, 51, 0, 0, 0], [14, 34, 35, 38, 51, 0, 0, 0], [8, 35, 35, 48, 54, 37, 0, 0], [20, 34, 34, 45, 48, 54, 39, 0], [33, 48, 40, 35, 58, 0, 0, 0], [26, 51, 48, 54, 51, 0, 0, 0], [9, 34, 41, 34, 51, 0, 0, 0], [20, 54, 52, 53, 34, 39, 34, 0], [15, 34, 47, 34, 47, 42, 34, 0], [11, 34, 41, 38, 51, 0, 0, 0], [27, 54, 46, 34, 0, 0, 0, 0], [21, 34, 41, 34, 52, 0, 0, 0], [26, 34, 45, 42, 35, 34, 0, 0], [26, 41, 34, 46, 48, 48, 47, 0]]


labels_x = [9, 0, 12, 4, 8, 12, 6, 1, 6, 7, 11, 14, 8, 4, 0, 5, 7, 12, 2, 5, 3, 9, 14, 1, 10, 12, 12, 14, 2, 2, 12, 13, 0, 2, 11]

首先，如果我采用最终输出而不是状态输出，那么它需要更多的迭代并且结果不好这里是代码：

import tensorflow as tf
import numpy as np
from tensorflow.contrib import rnn

epoch=2

tf.reset_default_graph()

input_x = tf.placeholder(tf.int32,shape=[None,None])

output_y = tf.placeholder(tf.int32,shape=[None,])

word_embedding = tf.get_variable('embedding',shape=[len(vocab_),250],dtype=tf.float32,initializer=tf.random_uniform_initializer(-0.01,0.01))

sequence_len= tf.count_nonzero(input_x,axis=-1)

with tf.variable_scope('encoder') as scope:

    output,state_output=tf.nn.bidirectional_dynamic_rnn(tf.nn.rnn_cell.LSTMCell(150),tf.nn.rnn_cell.LSTMCell(150),inputs=tf.nn.embedding_lookup(word_embedding,input_x),sequence_length=sequence_len,dtype=tf.float32)


transpose_w=tf.transpose(output[0],[1,0,2])
transpose_r=tf.transpose(output[1],[1,0,2])

final_output= tf.concat([transpose_r[-1],transpose_w[-1]],axis=-1)

weights=tf.get_variable('weights',shape=[2*150,len(labels_x)],dtype=tf.float32,initializer=tf.random_uniform_initializer(-0.01,0.01))

bias = tf.get_variable('bias',shape=[len(labels_x)],dtype=tf.float32,initializer=tf.random_uniform_initializer(-0.01,0.01))

final_result = tf.matmul(final_output,weights) + bias

#normalization
prob=tf.nn.softmax(final_result)
pred=tf.argmax(prob,axis=-1)

#cross entropy
ce=tf.nn.sparse_softmax_cross_entropy_with_logits(logits=final_result,labels=output_y)
loss=tf.reduce_mean(ce)

#evaluate
acc=tf.reduce_mean(tf.cast((tf.equal(tf.cast(pred,tf.int32),output_y)),tf.float32))

#train
train=tf.train.AdamOptimizer().minimize(loss)

with tf.Session() as sess:

    sess.run(tf.global_variables_initializer())

    for i in range(epoch):
        for j in range(200):
            first,second,third,forth,fifth,_=sess.run([loss,prob,pred,final_result,acc,train],feed_dict={input_x:sequences,output_y:labels_x})



            print("Iteration {}th epoch  {}th loss {}  accuracy {} ".format(j,i,first,fifth))

输出：

Iteration 0th epoch  0th loss 3.558173179626465  accuracy 0.02857142873108387 
Iteration 1th epoch  0th loss 3.556957960128784  accuracy 0.02857142873108387 
Iteration 2th epoch  0th loss 3.5557243824005127  accuracy 0.05714285746216774 

.
.
.
Iteration 197th epoch  1th loss 3.102834939956665  accuracy 0.20000000298023224 
Iteration 198th epoch  1th loss 3.1021459102630615  accuracy 0.20000000298023224 
Iteration 199th epoch  1th loss 3.101456880569458  accuracy 0.20000000298023224 

Process finished with exit code 0

如您所见，经过 400 次迭代后，结果并不好，精度只有 0.20，现在如果我采用隐藏状态输出而不是最终输出：

那么代码是：

import tensorflow as tf
import numpy as np
from tensorflow.contrib import rnn

epoch=2

tf.reset_default_graph()

input_x = tf.placeholder(tf.int32,shape=[None,None])
output_y = tf.placeholder(tf.int32,shape=[None,])

word_embedding = tf.get_variable('embedding',shape=[len(vocab_),250],dtype=tf.float32,initializer=tf.random_uniform_initializer(-0.01,0.01))

sequence_len= tf.count_nonzero(input_x,axis=-1)

with tf.variable_scope('encoder') as scope:
    output,state_output=tf.nn.bidirectional_dynamic_rnn(tf.nn.rnn_cell.LSTMCell(150),tf.nn.rnn_cell.LSTMCell(150),inputs=tf.nn.embedding_lookup(word_embedding,input_x),sequence_length=sequence_len,dtype=tf.float32)


transpose_w=tf.transpose(output[0],[1,0,2])
transpose_r=tf.transpose(output[1],[1,0,2])

state_out = tf.concat([state_output[0].c,state_output[1].c],axis=-1)
weights=tf.get_variable('weights',shape=[2*150,len(labels_x)],dtype=tf.float32,initializer=tf.random_uniform_initializer(-0.01,0.01))

bias = tf.get_variable('bias',shape=[len(labels_x)],dtype=tf.float32,initializer=tf.random_uniform_initializer(-0.01,0.01))

final_result = tf.matmul(state_out,weights) + bias

#normalization
prob=tf.nn.softmax(final_result)
pred=tf.argmax(prob,axis=-1)

#cross entropy
ce=tf.nn.sparse_softmax_cross_entropy_with_logits(logits=final_result,labels=output_y)
loss=tf.reduce_mean(ce)


#evaluate
acc=tf.reduce_mean(tf.cast((tf.equal(tf.cast(pred,tf.int32),output_y)),tf.float32))

#train
train=tf.train.AdamOptimizer().minimize(loss)


with tf.Session() as sess:

    sess.run(tf.global_variables_initializer())

    for i in range(epoch):
        for j in range(200):
            first,second,third,forth,fifth,_=sess.run([loss,prob,pred,final_result,acc,train],feed_dict={input_x:sequences,output_y:labels_x})
            print("Iteration {}th epoch  {}th loss {}  accuracy {} ".format(j,i,first,fifth))

输出为

Iteration 0th epoch  0th loss 3.557037830352783  accuracy 0.0 
Iteration 1th epoch  0th loss 3.553581476211548  accuracy 0.11428571492433548 
Iteration 2th epoch  0th loss 3.549212694168091  accuracy 0.17142857611179352 
Iteration 3th epoch  0th loss 3.5429491996765137  accuracy 0.2857142984867096 
.
.
.
.
.
Iteration 197th epoch  1th loss 0.19866235554218292  accuracy 0.8571428656578064 
Iteration 198th epoch  1th loss 0.19868074357509613  accuracy 0.8571428656578064 
Iteration 199th epoch  1th loss 0.19868910312652588  accuracy 0.8571428656578064 

Process finished with exit code 0

如您所见，它在相同的迭代下提供了很好的准确性，但是如果您查看不同的 github LSTM 分类代码或任何教程，您会发现每个人都在使用最终输出而不是最后状态输出，我在获取最终输出时是否犯了任何错误，这就是为什么我没有得到好的结果？请指导我，

提前致谢。

Answer 1

这不是一个完整的答案，但在这里我可以指出一些可能对您有帮助的要点，

I am doing simple classification using tf.nn.bidirectional_dynamic_rnn , which return two things , One is final_result of each cell and second is hidden state of only last cell

这是正确的。但是当你使用 LSTM 时，根据 documentation， tf.nn.bidirectional_dynamic_rnn 的输出是 pair (outputs, state)，其中 state 是 LSTMStateTuple，其中包含 sequence_length 给出的最后一个单元格的 hidden state 和 cell state batch.

中每个 example 的参数

鉴于您要对序列（而不是每个单词）进行分类，lstm 的最后状态包含所有先前状态的直觉和最后状态的（根据 sequence length）输出。所以只使用细胞状态是 ok 因为从这里你会得到序列前一个状态的所有直觉。这就是它运行良好的原因。

这里，从pair (outputs, state)开始，output包含了cell的所有输出。请记住，您用 0 填充了每个序列以使序列的大小相同。如果 t 大于特定示例的 sequence length，则 t^th 单元格的输出为空，但如果 [=25=，则单元格状态将从前一个单元格复制到下一个单元格] 大于 sequence length.

现在，如果您采用 LSTM 的 output，您将采用所有单元格的输出，包括应该丢弃的填充零单元格。 这可能会产生问题。

LSTM Tensorflow 中最终细胞状态和 RNN 输出之间的区别？

Difference between final cell state and RNN output in LSTM Tensorflow?

python

deep-learning

lstm

tensorflow

rnn