Tensorflow 中 RNN 的截断反向传播 (BPTT)
Truncated Back Propagation (BPTT) for RNN in Tensorflow
https://www.tensorflow.org/tutorials/recurrent#truncated_backpropagation
这里,TF官方文档说,
"In order to make the learning process tractable, it is common practice to create an 'unrolled' version of the network, which contains a fixed number (num_steps) of LSTM inputs and outputs."
文档包含;
words = tf.placeholder(tf.int32, [batch_size, num_steps])
lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
initial_state = state = tf.zeros([batch_size, lstm.state_size])
for i in range(num_steps):
output, state = lstm(words[:, i], state)
# The rest of the code.
# ...
final_state = state
# After some code lines...
numpy_state = initial_state.eval()
total_loss = 0.0
for current_batch_of_words in words_in_dataset:
numpy_state, current_loss = session.run([final_state, loss],
# Initialize the LSTM state from the previous iteration.
feed_dict={initial_state: numpy_state, words: current_batch_of_words})
total_loss += current_loss
这些行实现了截断反向传播 (BPTT) 部分,但我不确定上面的代码部分是否必要。 Tensorflow(我使用的是 1.3)是否会自动进行适当的反向传播,即使没有手写的反向传播实现部分?放置 BPTT 实现代码是否会显着提高预测准确性?
上面的代码使用从前一个时间步的 RNN 层返回的状态来提供下一个时间步的 RNNCell。根据官方文档,RNN(GRUCell, LSTMCell...) layer returns tuple of output and state,但我只用输出构建我的模型,并没有触及状态。我只是将输出传递给全连接层,并进行整形,然后用 tf.losses.softmax_cross_entropy.
计算损失
Does Tensorflow (I'm using 1.3) conduct proper backpropagation automatically, even if hand-written back prop implementation part is absent?
根据,是的! Tensorflow自动做微分,有效实现了BPTT
Does putting the BPTT implementation code increases prediction
accuracy noticeably?
您的 link 现在已损坏,但也许他们这样做只是为了展示什么是等效计算?我看不出有任何理由相信它会提高准确性。
https://www.tensorflow.org/tutorials/recurrent#truncated_backpropagation
这里,TF官方文档说,
"In order to make the learning process tractable, it is common practice to create an 'unrolled' version of the network, which contains a fixed number (num_steps) of LSTM inputs and outputs."
文档包含;
words = tf.placeholder(tf.int32, [batch_size, num_steps])
lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
initial_state = state = tf.zeros([batch_size, lstm.state_size])
for i in range(num_steps):
output, state = lstm(words[:, i], state)
# The rest of the code.
# ...
final_state = state
# After some code lines...
numpy_state = initial_state.eval()
total_loss = 0.0
for current_batch_of_words in words_in_dataset:
numpy_state, current_loss = session.run([final_state, loss],
# Initialize the LSTM state from the previous iteration.
feed_dict={initial_state: numpy_state, words: current_batch_of_words})
total_loss += current_loss
这些行实现了截断反向传播 (BPTT) 部分,但我不确定上面的代码部分是否必要。 Tensorflow(我使用的是 1.3)是否会自动进行适当的反向传播,即使没有手写的反向传播实现部分?放置 BPTT 实现代码是否会显着提高预测准确性?
上面的代码使用从前一个时间步的 RNN 层返回的状态来提供下一个时间步的 RNNCell。根据官方文档,RNN(GRUCell, LSTMCell...) layer returns tuple of output and state,但我只用输出构建我的模型,并没有触及状态。我只是将输出传递给全连接层,并进行整形,然后用 tf.losses.softmax_cross_entropy.
计算损失Does Tensorflow (I'm using 1.3) conduct proper backpropagation automatically, even if hand-written back prop implementation part is absent?
根据
Does putting the BPTT implementation code increases prediction accuracy noticeably?
您的 link 现在已损坏,但也许他们这样做只是为了展示什么是等效计算?我看不出有任何理由相信它会提高准确性。