如何在 TensorFlow 上进行 Xavier 初始化
How to do Xavier initialization on TensorFlow
我正在将我的 Caffe 网络移植到 TensorFlow,但它似乎没有 xavier 初始化。我正在使用 truncated_normal
但这似乎让训练变得更加困难。
我看了看,找不到任何内置的东西。但是,根据这个:
http://andyljones.tumblr.com/post/110998971763/an-explanation-of-xavier-initialization
Xavier 初始化只是对一个(通常是高斯)分布进行采样,其中方差是神经元数量的函数。 tf.random_normal
可以为你做到这一点,你只需要计算 stddev(即你试图初始化的权重矩阵所代表的神经元数量)。
@Aleph7,Xavier/Glorot 初始化取决于传入连接数 (fan_in)、传出连接数 (fan_out) 和激活函数的种类(sigmoid 或 tanh)神经元。看到这个:http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf
那么现在,回答你的问题。这就是我在 TensorFlow 中的做法:
(fan_in, fan_out) = ...
low = -4*np.sqrt(6.0/(fan_in + fan_out)) # use 4 for sigmoid, 1 for tanh activation
high = 4*np.sqrt(6.0/(fan_in + fan_out))
return tf.Variable(tf.random_uniform(shape, minval=low, maxval=high, dtype=tf.float32))
请注意,我们应该从均匀分布中抽样,而不是其他答案中建议的正态分布。
顺便写了一个post yesterday for something different using TensorFlow that happens to also use Xavier initialization. If you're interested, there's also a python notebook with an end-to-end example: https://github.com/delip/blog-stuff/blob/master/tensorflow_ufp.ipynb
tensorflow
的一个很好的包装器叫做 prettytensor
gives an implementation in the source code (copied directly from here):
def xavier_init(n_inputs, n_outputs, uniform=True):
"""Set the parameter initialization using the method described.
This method is designed to keep the scale of the gradients roughly the same
in all layers.
Xavier Glorot and Yoshua Bengio (2010):
Understanding the difficulty of training deep feedforward neural
networks. International conference on artificial intelligence and
statistics.
Args:
n_inputs: The number of input nodes into each output.
n_outputs: The number of output nodes for each input.
uniform: If true use a uniform distribution, otherwise use a normal.
Returns:
An initializer.
"""
if uniform:
# 6 was used in the paper.
init_range = math.sqrt(6.0 / (n_inputs + n_outputs))
return tf.random_uniform_initializer(-init_range, init_range)
else:
# 3 gives us approximately the same limits as above since this repicks
# values greater than 2 standard deviations from the mean.
stddev = math.sqrt(3.0 / (n_inputs + n_outputs))
return tf.truncated_normal_initializer(stddev=stddev)
从 0.8 版开始有一个 Xavier 初始化程序,see here for the docs。
你可以这样使用:
W = tf.get_variable("W", shape=[784, 256],
initializer=tf.contrib.layers.xavier_initializer())
TF-contrib 有 xavier_initializer
。这是一个如何使用它的例子:
import tensorflow as tf
a = tf.get_variable("a", shape=[4, 4], initializer=tf.contrib.layers.xavier_initializer())
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print sess.run(a)
除此之外,tensorflow还有其他初始化器:
只是添加另一个示例,说明如何定义使用 Xavier and Yoshua 的方法初始化的 tf.Variable
:
graph = tf.Graph()
with graph.as_default():
...
initializer = tf.contrib.layers.xavier_initializer()
w1 = tf.Variable(initializer(w1_shape))
b1 = tf.Variable(initializer(b1_shape))
...
由于在使用带有 RELU 的多层时数值不稳定,这使我无法在损失函数上获得 nan
值。
通过 kernel_initializer
参数到 tf.layers.conv2d, tf.layers.conv2d_transpose, tf.layers.Dense
等
例如
layer = tf.layers.conv2d(
input, 128, 5, strides=2,padding='SAME',
kernel_initializer=tf.contrib.layers.xavier_initializer())
https://www.tensorflow.org/api_docs/python/tf/layers/conv2d
https://www.tensorflow.org/api_docs/python/tf/layers/conv2d_transpose
以防万一你想像这样使用一行:
W = tf.Variable(tf.truncated_normal((n_prev, n), stddev=0.1))
你可以这样做:
W = tf.Variable(tf.contrib.layers.xavier_initializer()((n_prev, n)))
在 Tensorflow 2.0 中 tf.contrib.*
和 tf.get_variable()
都被弃用了。为了进行 Xavier 初始化,您现在必须切换到:
init = tf.initializers.GlorotUniform()
var = tf.Variable(init(shape=shape))
# or a oneliner with a little confusing brackets
var = tf.Variable(tf.initializers.GlorotUniform()(shape=shape))
Glorot uniform 和 Xavier uniform 是同一初始化类型的两个不同名称。如果您想了解更多关于如何在有或没有 Keras 的情况下在 TF2.0 中使用初始化,请参阅 documentation.
张量流 1:
W1 = tf.get_variable("W1", [25, 12288],
initializer = tf.contrib.layers.xavier_initializer(seed=1)
张量流 2:
W1 = tf.get_variable("W1", [25, 12288],
initializer = tf.random_normal_initializer(seed=1))
我正在将我的 Caffe 网络移植到 TensorFlow,但它似乎没有 xavier 初始化。我正在使用 truncated_normal
但这似乎让训练变得更加困难。
我看了看,找不到任何内置的东西。但是,根据这个:
http://andyljones.tumblr.com/post/110998971763/an-explanation-of-xavier-initialization
Xavier 初始化只是对一个(通常是高斯)分布进行采样,其中方差是神经元数量的函数。 tf.random_normal
可以为你做到这一点,你只需要计算 stddev(即你试图初始化的权重矩阵所代表的神经元数量)。
@Aleph7,Xavier/Glorot 初始化取决于传入连接数 (fan_in)、传出连接数 (fan_out) 和激活函数的种类(sigmoid 或 tanh)神经元。看到这个:http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf
那么现在,回答你的问题。这就是我在 TensorFlow 中的做法:
(fan_in, fan_out) = ...
low = -4*np.sqrt(6.0/(fan_in + fan_out)) # use 4 for sigmoid, 1 for tanh activation
high = 4*np.sqrt(6.0/(fan_in + fan_out))
return tf.Variable(tf.random_uniform(shape, minval=low, maxval=high, dtype=tf.float32))
请注意,我们应该从均匀分布中抽样,而不是其他答案中建议的正态分布。
顺便写了一个post yesterday for something different using TensorFlow that happens to also use Xavier initialization. If you're interested, there's also a python notebook with an end-to-end example: https://github.com/delip/blog-stuff/blob/master/tensorflow_ufp.ipynb
tensorflow
的一个很好的包装器叫做 prettytensor
gives an implementation in the source code (copied directly from here):
def xavier_init(n_inputs, n_outputs, uniform=True):
"""Set the parameter initialization using the method described.
This method is designed to keep the scale of the gradients roughly the same
in all layers.
Xavier Glorot and Yoshua Bengio (2010):
Understanding the difficulty of training deep feedforward neural
networks. International conference on artificial intelligence and
statistics.
Args:
n_inputs: The number of input nodes into each output.
n_outputs: The number of output nodes for each input.
uniform: If true use a uniform distribution, otherwise use a normal.
Returns:
An initializer.
"""
if uniform:
# 6 was used in the paper.
init_range = math.sqrt(6.0 / (n_inputs + n_outputs))
return tf.random_uniform_initializer(-init_range, init_range)
else:
# 3 gives us approximately the same limits as above since this repicks
# values greater than 2 standard deviations from the mean.
stddev = math.sqrt(3.0 / (n_inputs + n_outputs))
return tf.truncated_normal_initializer(stddev=stddev)
从 0.8 版开始有一个 Xavier 初始化程序,see here for the docs。
你可以这样使用:
W = tf.get_variable("W", shape=[784, 256],
initializer=tf.contrib.layers.xavier_initializer())
TF-contrib 有 xavier_initializer
。这是一个如何使用它的例子:
import tensorflow as tf
a = tf.get_variable("a", shape=[4, 4], initializer=tf.contrib.layers.xavier_initializer())
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print sess.run(a)
除此之外,tensorflow还有其他初始化器:
只是添加另一个示例,说明如何定义使用 Xavier and Yoshua 的方法初始化的 tf.Variable
:
graph = tf.Graph()
with graph.as_default():
...
initializer = tf.contrib.layers.xavier_initializer()
w1 = tf.Variable(initializer(w1_shape))
b1 = tf.Variable(initializer(b1_shape))
...
由于在使用带有 RELU 的多层时数值不稳定,这使我无法在损失函数上获得 nan
值。
通过 kernel_initializer
参数到 tf.layers.conv2d, tf.layers.conv2d_transpose, tf.layers.Dense
等
例如
layer = tf.layers.conv2d(
input, 128, 5, strides=2,padding='SAME',
kernel_initializer=tf.contrib.layers.xavier_initializer())
https://www.tensorflow.org/api_docs/python/tf/layers/conv2d
https://www.tensorflow.org/api_docs/python/tf/layers/conv2d_transpose
以防万一你想像这样使用一行:
W = tf.Variable(tf.truncated_normal((n_prev, n), stddev=0.1))
你可以这样做:
W = tf.Variable(tf.contrib.layers.xavier_initializer()((n_prev, n)))
在 Tensorflow 2.0 中 tf.contrib.*
和 tf.get_variable()
都被弃用了。为了进行 Xavier 初始化,您现在必须切换到:
init = tf.initializers.GlorotUniform()
var = tf.Variable(init(shape=shape))
# or a oneliner with a little confusing brackets
var = tf.Variable(tf.initializers.GlorotUniform()(shape=shape))
Glorot uniform 和 Xavier uniform 是同一初始化类型的两个不同名称。如果您想了解更多关于如何在有或没有 Keras 的情况下在 TF2.0 中使用初始化,请参阅 documentation.
张量流 1:
W1 = tf.get_variable("W1", [25, 12288],
initializer = tf.contrib.layers.xavier_initializer(seed=1)
张量流 2:
W1 = tf.get_variable("W1", [25, 12288],
initializer = tf.random_normal_initializer(seed=1))