Hard Sigmoid 是如何定义的

Question

我正在使用 keras 研究深度网络。有一个激活"hard sigmoid"。它的数学定义是什么？

我知道什么是Sigmoid。有人在 Quora 上问了类似的问题：https://www.quora.com/What-is-hard-sigmoid-in-artificial-neural-networks-Why-is-it-faster-than-standard-sigmoid-Are-there-any-disadvantages-over-the-standard-sigmoid

但是我在任何地方都找不到精确的数学定义？

Answer 1

由于 Keras 同时支持 Tensorflow 和 Theano，因此每个后端的具体实现可能不同 - 我将仅介绍 Theano。对于 Theano 后端，Keras 使用 T.nnet.hard_sigmoid，这又是 linearly approximated standard sigmoid:

slope = tensor.constant(0.2, dtype=out_dtype)
shift = tensor.constant(0.5, dtype=out_dtype)
x = (x * slope) + shift
x = tensor.clip(x, 0, 1)

即它是：max(0, min(1, x*0.2 + 0.5))

Answer 2

供参考，hard sigmoid function可能在不同的地方有不同的定义。在 Courbariaux 等人。 2016 [1] 定义为：

σ is the “hard sigmoid” function: σ(x) = clip((x + 1)/2, 0, 1) = max(0, min(1, (x + 1)/2))

目的是提供一个概率值（因此将其限制在 0 和 1 之间），用于神经网络参数（例如权重、激活、梯度）的随机二值化。您使用从 hard sigmoid 函数返回的概率 p = σ(x) 将参数 x 设置为 +1，概率为 p，或 -1，概率为 1-p.

[1] https://arxiv.org/abs/1602.02830 - "Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1"、Matthieu Courbariaux、Itay Hubara、Daniel Soudry、运行 El-Yaniv、Yoshua Bengio，（2016 年 2 月 9 日提交（v1） , 最后修订于 2016 年 3 月 17 日（此版本，v3））

Answer 3

是

  clip((x + 1)/2, 0, 1)

编码用语：

  max(0, min(1, (x + 1)/2))

Answer 4

硬 sigmoid 通常是 logistic sigmoid 函数的分段线性近似。根据您要保留的原始 sigmoid 的哪些属性，您可以使用不同的近似值。

我个人喜欢将函数保持为零，即 σ(0) = 0.5（移位）和 σ'(0) = 0.25（斜率）。这可以编码如下

def hard_sigmoid(x):
    return np.maximum(0, np.minimum(1, (x + 2) / 4))

Hard Sigmoid 是如何定义的

How is Hard Sigmoid defined

math

theano

deep-learning

keras

tensorflow