Bahdanau 注意中的状态维度

State dimensions in Bahdanau Attention

我目前正在尝试计算此函数以引起 Bahdanau 的注意

我的问题是解码器和编码器的 H。

在一个实现中，我看到一个 h 编码器，其尺寸为：[最大源 Len、批量大小、隐藏大小]

和具有以下维度的 h 解码器：[#lstm 层、批量大小、隐藏暗淡]

如果 W 矩阵的维数必须相同，我该如何计算加法： https://blog.floydhub.com/attention-mechanism/#bahdanau-att-step1

感谢帮助

在 Bahdanau 的原始论文中，解码器只有一个 LSTM 层。有多种方法可以处理多层。最常见的做法是在层之间进行注意力（你显然没有这样做，例如，参见第 0 维中的 a paper by Google). If you use multiple decoder layers like this, you can use only the last layer (i.e., do h_decoder[1]), alternatively, you can concatenate the layers (i.e., in torch call torch.cat or tf.concat）。

矩阵W_解码器和W_编码器确保编码器和解码器状态都投影到相同的维度（不管你对解码器层做了什么），所以你可以做求和。

唯一剩下的问题是编码器状态具有 max-length 维度。这里的技巧是您需要为投影解码器状态添加一个维度，以便广播求和，并且投影解码器状态与所有编码器状态相加。在 PyTorch 中，只需在投影解码器状态的第 0 维调用 unsqueeze, in TensorFlow expand_dims。

Bahdanau 注意中的状态维度

State dimensions in Bahdanau Attention

deep-learning

lstm

attention-model

seq2seq