在 Tensorflow 中可视化注意力激活

Visualizing attention activation in Tensorflow

有没有办法在 TensorFlow 的 seq2seq 模型中可视化某些输入的注意力权重,如上图 link(来自 Bahdanau 等人,2014 年)?我发现 TensorFlow's github issue 与此有关,但我找不到如何在会话期间获取注意力掩码。

我还想为我的文本摘要任务可视化 Tensorflow seq2seq ops 的注意力权重。我认为临时解决方案是使用 session.run() 来评估注意力掩码张量,如上所述。有趣的是,原始的 seq2seq.py ops 被认为是旧版本,无法在 github 中轻易找到,所以我只是使用 0.12.0 wheel 发行版中的 seq2seq.py 文件并对其进行了修改。为了绘制热图,我使用了 'Matplotlib' 包,非常方便。

新闻标题 textsum 的注意力可视化最终输出如下所示:

我修改了代码如下: https://github.com/rockingdingo/deepnlp/tree/master/deepnlp/textsum#attention-visualization

seq2seq_attn.py

# Find the attention mask tensor in function attention_decoder()-> attention()
# Add the attention mask tensor to ‘return’ statement of all the function that calls the attention_decoder(), 
# all the way up to model_with_buckets() function, which is the final function I use for bucket training.

def attention(query):
  """Put attention masks on hidden using hidden_features and query."""
  ds = []  # Results of attention reads will be stored here.

  # some code

  for a in xrange(num_heads):
    with variable_scope.variable_scope("Attention_%d" % a):
      # some code

      s = math_ops.reduce_sum(v[a] * math_ops.tanh(hidden_features[a] + y),
                              [2, 3])
      # This is the attention mask tensor we want to extract
      a = nn_ops.softmax(s)

      # some code

  # add 'a' to return function
  return ds, a

seq2seq_model_attn.py

# modified model.step() function and return masks tensor
self.outputs, self.losses, self.attn_masks = seq2seq_attn.model_with_buckets(…)

# use session.run() to evaluate attn masks
attn_out = session.run(self.attn_masks[bucket_id], input_feed)
attn_matrix = ...

predict_attn.py and eval.py

# Use the plot_attention function in eval.py to visual the 2D ndarray during prediction.

eval.plot_attention(attn_matrix[0:ty_cut, 0:tx_cut], X_label = X_label, Y_label = Y_label)

并且可能在未来tensorflow会有更好的方法来提取和可视化注意力权重图。有什么想法吗?