理解softmax激活函数中的'axis'参数

Question

假设我有一个输入张量，每个时间步携带一个嵌入词，例如时间 window 为 5，词嵌入向量宽度为 64，我得到形状：

(None, 5, 64, 1)

我应用了 4 个核形状为 (1, 64) 的过滤器来在每个时间步寻找特定的词，每个过滤器在每个时间步产生 1 个值，表示“word/meaning 存在”或“word/meaning 不存在”。它产生形状的输出张量：

(None, 5, 1, 4)

我如何定义 softmax 层的 'axis' 参数，以便每个时间步 所有卷积的输出 被归一化，就像在分类任务中一样？

更具体地说，我希望输出如下所示（高度是时间，宽度是通道）：

[[[.1, .4, .4, .1]]
 [[.9,  0,  0, .1]]
 [[.8,  0, .1, .1]]
 [[ 0,  1,  0,  0]]
 [[.6,. 1, .1, .2]]]

即每个 row/timestep 的分量加起来为 1，softmax 应该只归一化行。

代码片段：

model.add(layers.Conv2D(
    filters=words_of_interest,
    kernel_size=(1, embedding_length),
    strides=(1, embedding_length),
    padding="same")
)
model.add(layers.Softmax(axis=3)) # <- is this correct for what i described above?

Answer 1

对于轴的所有值都可以。 Tensorflow 以相同的方式对数组的值进行归一化。在这里，您可以使用以下代码检查 tensor 和 normalized 之间的区别。

import numpy as np
import tensorflow as tf

array = np.random.random((5, 4))
tensor = tf.convert_to_tensor(array)

norm0 = tf.keras.activations.softmax(tensor, axis=0)
norm1 = tf.keras.activations.softmax(tensor, axis=1)
norm2 = tf.keras.activations.softmax(tensor, axis=2)
norm3 = tf.keras.activations.softmax(tensor, axis=3)

print(sum(tensor[0]))
print(sum[norm0[0]])
print(sum[norm1[0]])
print(sum[norm2[0]])
print(sum[norm3[0]])

理解softmax激活函数中的'axis'参数

Understanding the 'axis' parameter in the softmax activation function

python

activation

keras

tensorflow

tensor