Tensorflow Keras 的 Glorot Normal Initializer 的均值不为零

Mean of Tensorflow Keras's Glorot Normal Initializer is not zero

根据 Glorot Normal 的文档,Normal Distributionmean Initial Weights 应该是 zero.

Draws samples from a truncated normal distribution centered on 0

但是好像不是zero,我是不是漏了什么?

请查找以下代码:

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import numpy as np

print(tf.__version__)

initializer = tf.keras.initializers.GlorotNormal(seed = 1234)


model = Sequential([Dense(units = 3, input_shape = [1], kernel_initializer = initializer,
                         bias_initializer = initializer),
                   Dense(units = 1, kernel_initializer = initializer,
                         bias_initializer = initializer)])

batch_size = 1

x = np.array([-1.0, 0, 1, 2, 3, 4.0], dtype = 'float32')
y = np.array([-3, -1.0, 1, 3.0, 5.0, 7.0], dtype = 'float32')

x = np.reshape(x, (-1, 1))

# Prepare the training dataset.
train_dataset = tf.data.Dataset.from_tensor_slices((x, y))
train_dataset = train_dataset.shuffle(buffer_size=64).batch(batch_size)

epochs = 1
learning_rate=1e-3

# Instantiate an optimizer.
optimizer = tf.keras.optimizers.SGD(learning_rate=learning_rate)

for epoch in range(epochs):

    # Iterate over the batches of the dataset.
    for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
        
        with tf.GradientTape() as tape:
            logits = model(x_batch_train, training=True)  # Logits for this minibatch

            # Compute the loss value for this minibatch.
            loss_value = tf.keras.losses.MSE(y_batch_train, logits)               
        
       
        Initial_Weights_1st_Hidden_Layer = model.trainable_weights[0]
       
        Mean_Weights_Hidden_Layer = tf.reduce_mean(Initial_Weights_1st_Hidden_Layer)
                          
        Initial_Weights_Output_Layer = model.trainable_weights[2]
        
        Mean_Weights_Output_Layer = tf.reduce_mean(Initial_Weights_Output_Layer)  
               
        Initial_Bias_1st_Hidden_Layer = model.trainable_weights[1]
        
        Mean_Bias_Hidden_Layer = tf.reduce_mean(Initial_Bias_1st_Hidden_Layer)      
        
        Initial_Bias_Output_Layer = model.trainable_weights[3]
        
        Mean_Bias_Output_Layer = tf.reduce_mean(Initial_Bias_Output_Layer)
        
        if epoch ==0 and step==0:
            
            print('\n Initial Weights of First-Hidden Layer = ', Initial_Weights_1st_Hidden_Layer)
            print('\n Mean of Weights of Hidden Layer = %s' %Mean_Weights_Hidden_Layer.numpy())
            
            print('\n Initial Weights of Second-Hidden/Output Layer = ', Initial_Weights_Output_Layer)
            print('\n Mean of Weights of Output Layer = %s' %Mean_Weights_Output_Layer.numpy())
                
            print('\n Initial Bias of First-Hidden Layer = ', Initial_Bias_1st_Hidden_Layer)
            print('\n Mean of Bias of Hidden Layer = %s' %Mean_Bias_Hidden_Layer.numpy())

            print('\n Initial Bias of Second-Hidden/Output Layer = ', Initial_Bias_Output_Layer)
            print('\n Mean of Bias of Output Layer = %s' %Mean_Bias_Output_Layer.numpy())

因为您没有从该分布中抽取太多样本。

initializer = tf.keras.initializers.GlorotNormal(seed = 1234)
mean = tf.reduce_mean(initializer(shape=(1, 3))).numpy()
print(mean) # -0.29880756

但是如果你增加样本:

initializer = tf.keras.initializers.GlorotNormal(seed = 1234)
mean = tf.reduce_mean(initializer(shape=(1, 500))).numpy()
print(mean) # 0.003004579

同样的事情也适用于你的例子。如果将第一个致密层的单位增加到 500,您应该会看到 0.003004579 具有相同的种子。