在 Keras 中使用自定义步骤激活函数会导致 "An operation has `None` for gradient." 错误。如何解决这个问题？

Question

我正在构建自动编码器，我想将我的值编码成一个逻辑矩阵。但是，当我在其中一个中间层（所有其他层都使用 'relu'）中使用我的自定义步骤激活函数时，keras 会引发此错误：

An operation has `None` for gradient.

我试过使用 hard-sigmoid 函数，但它不适合我的问题，因为当我只需要二进制时它仍然会产生中间值。我知道，在大多数情况下我的函数没有梯度，但是是否可以使用其他函数进行梯度计算并仍然使用阶跃函数进行精度和损失计算？

我的激活函数：

def binary_activation(x):
    ones = tf.ones(tf.shape(x), dtype=x.dtype.base_dtype)
    zeros = tf.zeros(tf.shape(x), dtype=x.dtype.base_dtype)
    return keras.backend.switch(x > 0.5, ones, zeros)

我希望能够使用二进制步进激活函数来训练网络，然后将其用作典型的自动编码器。与 this paper.

中使用的二进制特征图类似的东西

Answer 1

如前所述here, you could use tf.custom_gradient为您的激活函数定义"back-propagatable"梯度。

可能是这样的：

@tf.custom_gradient
def binary_activation(x):

    ones = tf.ones(tf.shape(x), dtype=x.dtype.base_dtype)
    zeros = tf.zeros(tf.shape(x), dtype=x.dtype.base_dtype)

    def grad(dy):
        return ...  # TODO define gradient
  return keras.backend.switch(x > 0.5, ones, zeros), grad

在 Keras 中使用自定义步骤激活函数会导致 "An operation has `None` for gradient." 错误。如何解决这个问题？

Using a custom step activation function in Keras results in "An operation has `None` for gradient." error. How to resolve this?

python

gradient

keras

tensorflow

activation-function