如何在 TensorFlow moments 中设置 axes 参数以进行批量归一化？

Question

我计划实施类似于 this blog (or just using tf.nn.batch_normalization) using tf.nn.moments 的批量归一化函数来计算均值和方差，但我希望对向量和图像类型的时间数据执行此操作。我通常很难理解如何在 tf.nn.moments.

中正确设置 axes 参数

我的矢量序列输入数据的形状为 (batch, timesteps, channels)，图像序列的输入数据的形状为 (batch, timesteps, height, width, 3)（注意它们是 RGB 图像）。在这两种情况下，我都希望跨时间步跨整个批次和进行标准化，这意味着我 而不是 试图保持单独的 mean/variance对于不同的时间步长。

如何为不同的数据类型（例如图像、矢量）和 temporal/non-temporal 正确设置 axes？

Answer 1

最简单的思考方式是 - 传递到 axes 的轴将折叠，统计数据将通过 切片计算 axes。示例：

import tensorflow as tf

x = tf.random.uniform((8, 10, 4))

print(x, '\n')
print(tf.nn.moments(x, axes=[0]), '\n')
print(tf.nn.moments(x, axes=[0, 1]))

Tensor("random_uniform:0", shape=(8, 10, 4), dtype=float32)

(<tf.Tensor 'moments/Squeeze:0'   shape=(10, 4) dtype=float32>,
 <tf.Tensor 'moments/Squeeze_1:0' shape=(10, 4) dtype=float32>)

(<tf.Tensor 'moments_1/Squeeze:0'   shape=(4,) dtype=float32>,
 <tf.Tensor 'moments_1/Squeeze_1:0' shape=(4,) dtype=float32>)

从源代码中，math_ops.reduce_mean 用于计算 mean 和 variance，其在伪代码中的运行方式为：

# axes = [0]
mean = (x[0, :, :] + x[1, :, :] + ... + x[7, :, :]) / 8
mean.shape == (10, 4)  # each slice's shape is (10, 4), so sum's shape is also (10, 4)

# axes = [0, 1]
mean = (x[0, 0,  :] + x[1, 0,  :] + ... + x[7, 0,  :] +
        x[0, 1,  :] + x[1, 1,  :] + ... + x[7, 1,  :] +
        ... +
        x[0, 10, :] + x[1, 10, :] + ... + x[7, 10, :]) / (8 * 10)
mean.shape == (4, ) # each slice's shape is (4, ), so sum's shape is also (4, )

换句话说，axes=[0] 将计算 (timesteps, channels) 关于 samples 的统计数据 - 即迭代 samples，计算 (timesteps, channels) 的均值和方差切片。因此，对于

normalization to happen across the entire batch and across timesteps, meaning I am not trying to maintain separate mean/variance for different timesteps

您只需折叠 timesteps 维度（沿着 samples），并通过迭代 samples 和 timesteps:

来计算统计数据

axes = [0, 1]

图像也是如此，除了你有两个 non-channel/sample 维度，你会做 axes = [0, 1, 2]（折叠 samples, height, width）。

伪代码演示：查看实际计算均值

import tensorflow as tf
import tensorflow.keras.backend as K
import numpy as np

x = tf.constant(np.random.randn(8, 10, 4))
result1 = tf.add(x[0], tf.add(x[1], tf.add(x[2], tf.add(x[3], tf.add(x[4], 
                       tf.add(x[5], tf.add(x[6], x[7]))))))) / 8
result2 = tf.reduce_mean(x, axis=0)
print(K.eval(result1 - result2))

# small differences per numeric imprecision
[[ 2.77555756e-17  0.00000000e+00 -5.55111512e-17 -1.38777878e-17]
 [-2.77555756e-17  2.77555756e-17  0.00000000e+00 -1.38777878e-17]
 [ 0.00000000e+00 -5.55111512e-17  0.00000000e+00 -2.77555756e-17]
 [-1.11022302e-16  2.08166817e-17  2.22044605e-16  0.00000000e+00]
 [ 0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00]
 [-5.55111512e-17  2.77555756e-17 -1.11022302e-16  5.55111512e-17]
 [ 0.00000000e+00  0.00000000e+00  0.00000000e+00 -2.77555756e-17]
 [ 0.00000000e+00  0.00000000e+00  0.00000000e+00 -5.55111512e-17]
 [ 0.00000000e+00 -3.46944695e-17 -2.77555756e-17  1.11022302e-16]
 [-5.55111512e-17  5.55111512e-17  0.00000000e+00  1.11022302e-16]]

如何在 TensorFlow moments 中设置 axes 参数以进行批量归一化？

How do you set the axes parameter in TensorFlow moments for batch normalization?

python

tensorflow

batch-normalization

tensorflow-datasets