神经网络的初始偏置值
Initial bias values for a neural network
我目前正在 tensorflow 中构建一个 CNN,并且我正在使用 He 正常权重初始化来初始化我的权重矩阵。但是,我不确定应该如何初始化偏差值。我使用 ReLU 作为每个卷积层之间的激活函数。是否有初始化偏差值的标准方法?
# Define approximate xavier weight initialization (with RelU correction described by He)
def xavier_over_two(shape):
std = np.sqrt(shape[0] * shape[1] * shape[2])
return tf.random_normal(shape, stddev=std)
def bias_init(shape):
return #???
Initializing the biases. It is possible and common to initialize the
biases to be zero, since the asymmetry breaking is provided by the
small random numbers in the weights. For ReLU non-linearities, some
people like to use small constant value such as 0.01 for all biases
because this ensures that all ReLU units fire in the beginning and
therefore obtain and propagate some gradient. However, it is not clear
if this provides a consistent improvement (in fact some results seem
to indicate that this performs worse) and it is more common to simply
use 0 bias initialization.
注意最后一层偏差的具体情况。正如 Andrej Karpathy 在他的 Recipe for Training Neural Networks 中解释的那样:
init well. Initialize the final layer weights correctly. E.g. if you are regressing some values that have a mean of 50 then initialize the final bias to 50. If you have an imbalanced dataset of a ratio 1:10 of positives:negatives, set the bias on your logits such that your network predicts probability of 0.1 at initialization. Setting these correctly will speed up convergence and eliminate “hockey stick” loss curves where in the first few iteration your network is basically just learning the bias.
我目前正在 tensorflow 中构建一个 CNN,并且我正在使用 He 正常权重初始化来初始化我的权重矩阵。但是,我不确定应该如何初始化偏差值。我使用 ReLU 作为每个卷积层之间的激活函数。是否有初始化偏差值的标准方法?
# Define approximate xavier weight initialization (with RelU correction described by He)
def xavier_over_two(shape):
std = np.sqrt(shape[0] * shape[1] * shape[2])
return tf.random_normal(shape, stddev=std)
def bias_init(shape):
return #???
Initializing the biases. It is possible and common to initialize the biases to be zero, since the asymmetry breaking is provided by the small random numbers in the weights. For ReLU non-linearities, some people like to use small constant value such as 0.01 for all biases because this ensures that all ReLU units fire in the beginning and therefore obtain and propagate some gradient. However, it is not clear if this provides a consistent improvement (in fact some results seem to indicate that this performs worse) and it is more common to simply use 0 bias initialization.
注意最后一层偏差的具体情况。正如 Andrej Karpathy 在他的 Recipe for Training Neural Networks 中解释的那样:
init well. Initialize the final layer weights correctly. E.g. if you are regressing some values that have a mean of 50 then initialize the final bias to 50. If you have an imbalanced dataset of a ratio 1:10 of positives:negatives, set the bias on your logits such that your network predicts probability of 0.1 at initialization. Setting these correctly will speed up convergence and eliminate “hockey stick” loss curves where in the first few iteration your network is basically just learning the bias.