Attention Mechanism中的"source hidden state"指的是什么？

Question

注意力权重计算如下：

我想知道h_s指的是什么。

在tensorflow代码中，编码器RNNreturns一个元组：

encoder_outputs, encoder_state = tf.nn.dynamic_rnn(...)

我认为h_s应该是encoder_state，但是github/nmt给出了不同的答案？

# attention_states: [batch_size, max_time, num_units]
attention_states = tf.transpose(encoder_outputs, [1, 0, 2])

# Create an attention mechanism
attention_mechanism = tf.contrib.seq2seq.LuongAttention(
    num_units, attention_states,
    memory_sequence_length=source_sequence_length)

我是不是误解了代码？或者 h_s 实际上意味着 encoder_outputs?

Answer 1

公式大概来自this post，所以我就用同一张NN图post:

这里，h-bar(s)是来自encoder（最后一层）的所有蓝色隐藏状态，h(t)是当前的红色隐藏状态来自解码器（也是最后一层）。一张图片 t=0，您可以看到哪些块连接到带有虚线箭头的注意力权重。 score 函数通常是其中之一：

Tensorflow 注意力机制与这张图片相符。理论上，单元输出在大多数情况下是它的隐藏状态（一个例外是LSTM单元，其中输出是状态的short-term部分，即使在这种情况下输出更适合注意力机制）。在实践中，tensorflow 的 encoder_state 不同于 encoder_outputs 输入填充零时：状态从前一个单元格状态传播，而输出为零。显然，您不想关注尾随零，因此对这些单元格设置 h-bar(s) 是有意义的。

所以encoder_outputs正是从蓝色方块向上的箭头。稍后在代码中，attention_mechanism 连接到每个 decoder_cell，因此它的输出通过上下文向量到达图片上的黄色块。

decoder_cell = tf.contrib.seq2seq.AttentionWrapper(
    decoder_cell, attention_mechanism,
    attention_layer_size=num_units)

Attention Mechanism中的"source hidden state"指的是什么？

What does the "source hidden state" refer to in the Attention Mechanism?

nlp

machine-learning

deep-learning

attention-model

sequence-to-sequence