tf.contrib.seq2seq.AttentionWrapper 的参数 attention_size 是什么意思？

Question

tf.contrib.seq2seq.AttentionWrapper中有一个参数anttention_size，文档说"The basic attention wrapper is tf.contrib.seq2seq.AttentionWrapper. This wrapper accepts an RNNCell instance, an instance of AttentionMechanism, and an attention depth parameter (attention_size);"，但什么是注意力深度？在Bahdanau和Luong的论文中，我发现根本没有注意力深度，注意力机制的源代码我也没有看清楚。谁能告诉我'attention_size'的意思和原理，谢谢！

Answer 1

据我了解，original paper avoid mixing base theory with implementation details. Thus, they defined attention/context size equal to the encoder hidden size ( for bi-directional LSTM) 的作者如下：

但是，如果编码器隐藏大小太大，计算长序列的注意力可能会消耗大量时间和内存。

因此，tensorflow implementation 引入了额外的密集 attention_layer 和可调 attention_size 选项（attention_layer_size 以后的版本）如下：

  if attention_layer is not None:
    attention = attention_layer(array_ops.concat([cell_output, context], 1))
  else:
    attention = context

TL;DR; 您可以使用 attention_size 选项来减少注意机制的内存消耗，当编码器隐藏大小时太大了。

tf.contrib.seq2seq.AttentionWrapper 的参数 attention_size 是什么意思？

what the argument attention_size of tf.contrib.seq2seq.AttentionWrapper mean?

tensorflow

attention-model