如何将 LSTM 的先前输出和隐藏状态用于注意力机制?
How to use previous output and hidden states from LSTM for the attention mechanism?
我目前正在尝试根据这篇论文对注意力机制进行编码:"Effective Approaches to Attention-based Neural Machine Translation", Luong, Pham, Manning (2015)。 (我使用全局注意力和点分数)。
但是,我不确定如何从 lstm 解码中输入隐藏状态和输出状态。问题是 lstm 解码器在时间 t 的输入取决于我需要使用 t-1 的输出和隐藏状态计算的数量。
代码的相关部分如下:
with tf.variable_scope('data'):
prob = tf.placeholder_with_default(1.0, shape=())
X_or = tf.placeholder(shape = [batch_size, timesteps_1, num_input], dtype = tf.float32, name = "input")
X = tf.unstack(X_or, timesteps_1, 1)
y = tf.placeholder(shape = [window_size,1], dtype = tf.float32, name = "label_annotation")
logits = tf.zeros((1,1), tf.float32)
with tf.variable_scope('lstm_cell_encoder'):
rnn_layers = [tf.nn.rnn_cell.LSTMCell(size) for size in [hidden_size, hidden_size]]
multi_rnn_cell = tf.nn.rnn_cell.MultiRNNCell(rnn_layers)
lstm_outputs, lstm_state = tf.contrib.rnn.static_rnn(cell=multi_rnn_cell,inputs=X,dtype=tf.float32)
concat_lstm_outputs = tf.stack(tf.squeeze(lstm_outputs))
last_encoder_state = lstm_state[-1]
with tf.variable_scope('lstm_cell_decoder'):
initial_input = tf.unstack(tf.zeros(shape=(1,1,hidden_size2)))
rnn_decoder_cell = tf.nn.rnn_cell.LSTMCell(hidden_size, state_is_tuple = True)
# Compute the hidden and output of h_1
for index in range(window_size):
output_decoder, state_decoder = tf.nn.static_rnn(rnn_decoder_cell, initial_input, initial_state=last_encoder_state, dtype=tf.float32)
# Compute the score for source output vector
scores = tf.matmul(concat_lstm_outputs, tf.reshape(output_decoder[-1],(hidden_size,1)))
attention_coef = tf.nn.softmax(scores)
context_vector = tf.reduce_sum(tf.multiply(concat_lstm_outputs, tf.reshape(attention_coef, (window_size, 1))),0)
context_vector = tf.reshape(context_vector, (1,hidden_size))
# compute the tilda hidden state \tilde{h}_t=tanh(W[c_t, h_t]+b_t)
concat_context = tf.concat([context_vector, output_decoder[-1]], axis = 1)
W_tilde = tf.Variable(tf.random_normal(shape = [hidden_size*2, hidden_size2], stddev = 0.1), name = "weights_tilde", trainable = True)
b_tilde = tf.Variable(tf.zeros([1, hidden_size2]), name="bias_tilde", trainable = True)
hidden_tilde = tf.nn.tanh(tf.matmul(concat_context, W_tilde)+b_tilde) # hidden_tilde is [1*64]
# update for next time step
initial_input = tf.unstack(tf.reshape(hidden_tilde, (1,1,hidden_size2)))
last_encoder_state = state_decoder
# predict the target
W_target = tf.Variable(tf.random_normal(shape = [hidden_size2, 1], stddev = 0.1), name = "weights_target", trainable = True)
logit = tf.matmul(hidden_tilde, W_target)
logits = tf.concat([logits, logit], axis = 0)
logits = logits[1:]
循环里面的部分是我不确定的。当我覆盖变量"initial_input"和"last_encoder_state"时,tensorflow会记住计算图吗?
我认为如果您将 tf.contrib.seq2seq.AttentionWrapper
与以下实现之一一起使用,您的模型会大大简化:BahdanauAttention
或 LuongAttention
.
这样就可以在单元格级别连接注意力向量,以便在应用注意力后单元格输出已经。来自 seq2seq tutorial 的示例:
cell = LSTMCell(512)
attention_mechanism = tf.contrib.seq2seq.LuongAttention(512, encoder_outputs)
attn_cell = tf.contrib.seq2seq.AttentionWrapper(cell, attention_mechanism, attention_size=256)
请注意,这样您就不需要 window_size
的循环,因为 tf.nn.static_rnn
或 tf.nn.dynamic_rnn
将实例化用注意力包裹的单元格。
关于你的问题:你应该区分python变量和tensorflow图节点:你可以将last_encoder_state
分配给不同的张量,原始图节点不会因此而改变。这很灵活,但也可能在结果网络中产生误导——您可能认为您将 LSTM 连接到一个张量,但实际上是另一个。一般来说,你不应该那样做。
我目前正在尝试根据这篇论文对注意力机制进行编码:"Effective Approaches to Attention-based Neural Machine Translation", Luong, Pham, Manning (2015)。 (我使用全局注意力和点分数)。
但是,我不确定如何从 lstm 解码中输入隐藏状态和输出状态。问题是 lstm 解码器在时间 t 的输入取决于我需要使用 t-1 的输出和隐藏状态计算的数量。
代码的相关部分如下:
with tf.variable_scope('data'):
prob = tf.placeholder_with_default(1.0, shape=())
X_or = tf.placeholder(shape = [batch_size, timesteps_1, num_input], dtype = tf.float32, name = "input")
X = tf.unstack(X_or, timesteps_1, 1)
y = tf.placeholder(shape = [window_size,1], dtype = tf.float32, name = "label_annotation")
logits = tf.zeros((1,1), tf.float32)
with tf.variable_scope('lstm_cell_encoder'):
rnn_layers = [tf.nn.rnn_cell.LSTMCell(size) for size in [hidden_size, hidden_size]]
multi_rnn_cell = tf.nn.rnn_cell.MultiRNNCell(rnn_layers)
lstm_outputs, lstm_state = tf.contrib.rnn.static_rnn(cell=multi_rnn_cell,inputs=X,dtype=tf.float32)
concat_lstm_outputs = tf.stack(tf.squeeze(lstm_outputs))
last_encoder_state = lstm_state[-1]
with tf.variable_scope('lstm_cell_decoder'):
initial_input = tf.unstack(tf.zeros(shape=(1,1,hidden_size2)))
rnn_decoder_cell = tf.nn.rnn_cell.LSTMCell(hidden_size, state_is_tuple = True)
# Compute the hidden and output of h_1
for index in range(window_size):
output_decoder, state_decoder = tf.nn.static_rnn(rnn_decoder_cell, initial_input, initial_state=last_encoder_state, dtype=tf.float32)
# Compute the score for source output vector
scores = tf.matmul(concat_lstm_outputs, tf.reshape(output_decoder[-1],(hidden_size,1)))
attention_coef = tf.nn.softmax(scores)
context_vector = tf.reduce_sum(tf.multiply(concat_lstm_outputs, tf.reshape(attention_coef, (window_size, 1))),0)
context_vector = tf.reshape(context_vector, (1,hidden_size))
# compute the tilda hidden state \tilde{h}_t=tanh(W[c_t, h_t]+b_t)
concat_context = tf.concat([context_vector, output_decoder[-1]], axis = 1)
W_tilde = tf.Variable(tf.random_normal(shape = [hidden_size*2, hidden_size2], stddev = 0.1), name = "weights_tilde", trainable = True)
b_tilde = tf.Variable(tf.zeros([1, hidden_size2]), name="bias_tilde", trainable = True)
hidden_tilde = tf.nn.tanh(tf.matmul(concat_context, W_tilde)+b_tilde) # hidden_tilde is [1*64]
# update for next time step
initial_input = tf.unstack(tf.reshape(hidden_tilde, (1,1,hidden_size2)))
last_encoder_state = state_decoder
# predict the target
W_target = tf.Variable(tf.random_normal(shape = [hidden_size2, 1], stddev = 0.1), name = "weights_target", trainable = True)
logit = tf.matmul(hidden_tilde, W_target)
logits = tf.concat([logits, logit], axis = 0)
logits = logits[1:]
循环里面的部分是我不确定的。当我覆盖变量"initial_input"和"last_encoder_state"时,tensorflow会记住计算图吗?
我认为如果您将 tf.contrib.seq2seq.AttentionWrapper
与以下实现之一一起使用,您的模型会大大简化:BahdanauAttention
或 LuongAttention
.
这样就可以在单元格级别连接注意力向量,以便在应用注意力后单元格输出已经。来自 seq2seq tutorial 的示例:
cell = LSTMCell(512)
attention_mechanism = tf.contrib.seq2seq.LuongAttention(512, encoder_outputs)
attn_cell = tf.contrib.seq2seq.AttentionWrapper(cell, attention_mechanism, attention_size=256)
请注意,这样您就不需要 window_size
的循环,因为 tf.nn.static_rnn
或 tf.nn.dynamic_rnn
将实例化用注意力包裹的单元格。
关于你的问题:你应该区分python变量和tensorflow图节点:你可以将last_encoder_state
分配给不同的张量,原始图节点不会因此而改变。这很灵活,但也可能在结果网络中产生误导——您可能认为您将 LSTM 连接到一个张量,但实际上是另一个。一般来说,你不应该那样做。