tf.nn.leaky_relu( features, alpha=0.2, name=None ) 中有关 alpha 的详细信息

Details about alpha in tf.nn.leaky_relu( features, alpha=0.2, name=None )

我正在尝试使用 leaky_relu 作为隐藏层的激活函数。对于参数alpha,解释为:

slope of the activation function at x < 0

这是什么意思? alpha的不同取值会对模型的结果产生什么影响?

有关 ReLU 及其变体的深入解释在以下链接中:

  1. https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/
  2. https://medium.com/@himanshuxd/activation-functions-sigmoid-relu-leaky-relu-and-softmax-basics-for-neural-networks-and-deep-8d9c70eed91e

在常规 ReLU 中,主要缺点是激活的输入可能为负,因为在网络中执行的操作会导致所谓的“Dying RELU”问题

the gradient is 0 whenever the unit is not active. This could lead to cases where a unit never activates as a gradient-based optimization algorithm will not adjust the weights of a unit that never activates initially. Further, like the vanishing gradients problem, we might expect learning to be slow when training ReLU networks with constant 0 gradients.

所以 Leaky ReLU 用一些小值代替零值,比如 0.001(称为“alpha”)。因此,对于 leaky ReLU,函数 f(x) = max(0.001x, x)。现在 0.001x 的梯度下降将具有 non-zero 值,它将继续学习而不会到达死胡同。