Tensorflow 中 RNN 的截断反向传播 (BPTT)

Question

https://www.tensorflow.org/tutorials/recurrent#truncated_backpropagation

这里，TF官方文档说，

"In order to make the learning process tractable, it is common practice to create an 'unrolled' version of the network, which contains a fixed number (num_steps) of LSTM inputs and outputs."

文档包含；

words = tf.placeholder(tf.int32, [batch_size, num_steps])
lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
initial_state = state = tf.zeros([batch_size, lstm.state_size])
for i in range(num_steps):
    output, state = lstm(words[:, i], state)
    # The rest of the code.
    # ...
final_state = state

# After some code lines...
numpy_state = initial_state.eval()
total_loss = 0.0
for current_batch_of_words in words_in_dataset:
    numpy_state, current_loss = session.run([final_state, loss],
        # Initialize the LSTM state from the previous iteration.
        feed_dict={initial_state: numpy_state, words: current_batch_of_words})
    total_loss += current_loss

这些行实现了截断反向传播 (BPTT) 部分，但我不确定上面的代码部分是否必要。 Tensorflow（我使用的是 1.3）是否会自动进行适当的反向传播，即使没有手写的反向传播实现部分？放置 BPTT 实现代码是否会显着提高预测准确性？

上面的代码使用从前一个时间步的 RNN 层返回的状态来提供下一个时间步的 RNNCell。根据官方文档，RNN(GRUCell, LSTMCell...) layer returns tuple of output and state，但我只用输出构建我的模型，并没有触及状态。我只是将输出传递给全连接层，并进行整形，然后用 tf.losses.softmax_cross_entropy.

计算损失

Answer 1

Does Tensorflow (I'm using 1.3) conduct proper backpropagation automatically, even if hand-written back prop implementation part is absent?

根据，是的！ Tensorflow自动做微分，有效实现了BPTT

Does putting the BPTT implementation code increases prediction accuracy noticeably?

您的 link 现在已损坏，但也许他们这样做只是为了展示什么是等效计算？我看不出有任何理由相信它会提高准确性。

Tensorflow 中 RNN 的截断反向传播 (BPTT)

Truncated Back Propagation (BPTT) for RNN in Tensorflow

lstm

tensorflow

rnn