我应该使用什么激活函数来强制执行舍入行为

Question

我需要一个四舍五入我的张量的激活函数。

函数 round() 的导数（梯度）为 0（或在 tensorflow 中为 None），这使得它无法用作激活函数。

我正在寻找一个强制执行类似舍入行为的函数，以便我的模型的结果不只是近似数字。（因为我的标签是整数）

我知道公式：tanh ○ sigmoid 用于强制 {-1, 0, 1} 数字仅流经模型，那么是否存在一些可推导的函数组合来模拟舍入行为？

Answer 1

也许 softmax 函数 tf.nn.softmax_cross_entropy_with_logits_v2 的交叉熵损失就是您要找的，请参阅

https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits_v2

也看看

https://deepnotes.io/softmax-crossentropy

Answer 2

如果你想在真实的直线上近似圆形，你可以这样做：

def approx_round(x, steepness=1):
    floor_part = tf.floor(x)
    remainder = tf.mod(x, 1)
    return floor_part + tf.sigmoid(steepness*(remainder - 0.5))

事实上，有多种方法可以在 Tensorflow 中注册您自己的梯度（例如，参见）。但是，我不太熟悉实现这一部分，因为我不经常使用 Keras/TensorFlow。

根据函数可以给出此近似值的梯度，如下所示：

def approx_round_grad(x, steepness=1):
    remainder = tf.mod(x, 1)
    sig = tf.sigmoid(steepness*(remainder - 0.5))
    return sig*(1 - sig)

明确地说，此近似值假设您使用的是 "steep enough" steepness 参数，因为 sigmoid 函数不会恰好达到 0 或 1，除非在大参数的限制下.

要执行半正弦近似之类的操作，您可以使用以下内容：

def approx_round_sin(x, width=0.1):
    if width > 1 or width <= 0:
        raise ValueError('Width must be between zero (exclusive) and one (inclusive)')
    floor_part = tf.floor(x)
    remainder = tf.mod(x, 1)
    return (floor_part + clipped_sin(remainder, width))

def clipped_sin(x, width):
    half_width = width/2
    sin_part = (1 + tf.sin(np.pi*((x-0.5)/width)))/2
    whole = sin_part*tf.cast(tf.abs(x - 0.5) < half_width, tf.float32)
    whole += tf.cast(x > 0.5 + half_width, tf.float32)
    return whole

def approx_round_grad_sin(x, width=0.1):
    if width > 1 or width <= 0:
        raise ValueError('Width must be between zero (exclusive) and one (inclusive)')
    remainder = tf.mod(x, 1)
    return clipped_cos(remainder, width)

def clipped_cos(x, width):
    half_width = width/2
    cos_part = np.pi*tf.cos(np.pi*((x-0.5)/width))/(2*width)
    return cos_part*tf.cast(tf.abs(x - 0.5) < half_width, dtype=tf.float32)

我应该使用什么激活函数来强制执行舍入行为

what activation function should I use to enforce rounding like behaviour

python

derivative

neural-network

tensorflow

activation-function