如何在 TensorFlow 中执行可微运算选择？

Question

我正在尝试生成一个基于标量输入的数学运算选择 nn 模型。该操作是根据 nn 产生的 softmax 结果来选择的。然后必须将此操作应用于标量输入以产生最终输出。到目前为止，我已经想出了在 softmax 输出上应用 argmax 和 onehot 以生成一个掩码，然后将其应用于所有可能要执行的操作的连接值矩阵（如下面的伪代码所示）。问题是 argmax 和 onehot 似乎都不可微分。我对此很陌生，所以任何人都会受到高度赞赏。提前致谢。

    #perform softmax    
    logits  = tf.matmul(current_input, W) + b
    softmax = tf.nn.softmax(logits)

    #perform all possible operations on the input
    op_1_val = tf_op_1(current_input)
    op_2_val = tf_op_2(current_input)
    op_3_val = tf_op_2(current_input)
    values = tf.concat([op_1_val, op_2_val, op_3_val], 1)

    #create a mask
    argmax  = tf.argmax(softmax, 1)
    mask  = tf.one_hot(argmax, num_of_operations)

    #produce the input, by masking out those operation results which have not been selected
    output = values * mask

Answer 1

我认为这是不可能的。这类似于 paper 中描述的 Hard Attention。 Hard attention 在 Image captioning 中使用，让模型在每一步只关注图像的特定部分。硬注意力是不可区分的，但有两种方法可以解决这个问题：

1- 使用强化学习 (RL)：RL 用于训练做出决策的模型。即使损失函数不会将任何梯度反向传播到用于决策的 softmax，您也可以使用 RL 技术来优化决策。举一个简单的例子，你可以将损失视为惩罚，并向节点发送一个在 softmax 层中具有最大值的与惩罚成比例的策略梯度，以便在决策不好时降低决策的分数（结果损失惨重）。

2- 使用软注意力之类的东西：不要只选择一个操作，而是将它们与基于 softmax 的权重混合。所以而不是：

output = values * mask

使用：

output = values * softmax

现在，操作将根据 softmax 不 select 的程度收敛到零。与 RL 相比，这更容易训练，但如果您必须从最终结果中完全删除非 selected 操作（将它们完全设置为零），它将无法工作。

这是另一个讨论硬注意力和软注意力的答案，您可能会觉得有帮助：

如何在 TensorFlow 中执行可微运算选择？

How do I perform a differentiable operation selection in TensorFlow?

machine-learning

calculus

neural-network

tensorflow

recurrent-neural-network