如果卷积后跟归一化层,则从卷积中移除偏差

Remove bias from the convolution if the convolution is followed by a normalization layer

def __init__(self):
    super().__init__()

    self.conv = nn.Sequential(
        nn.Conv2d(32, 64, kernel_size=5, stride=2),
        nn.BatchNorm2d(64),
        nn.ReLU(),

        nn.Conv2d(64, 64, kernel_size=3, stride=2),
        nn.BatchNorm2d(64),
        nn.ReLU(),
        
        nn.Conv2d(64, 64, kernel_size=3, stride=2),
        nn.BatchNorm2d(64),
        nn.ReLU(),

        nn.Conv2d(64, 64, kernel_size=3, stride=2),
        nn.BatchNorm2d(64),
        nn.ReLU(),

        nn.Conv2d(64, 64, kernel_size=3, stride=2),
        nn.BatchNorm2d(64),

        nn.AvgPool2d()
    )

    conv_out_size = self._get_conv_out((32, 110, 110))

    self.fc = nn.Sequential(
        nn.Linear(conv_out_size, 1),
        nn.Sigmoid(),
    )

我有这个模型,在我看来一切都很好。但是,它说如果卷积后跟一个归一化层,我必须从卷积中移除偏差,因为它已经包含偏差参数。你能解释一下我为什么以及如何做到这一点吗?

Batch normalization = gamma * normalize(x) + bias 因此,在卷积层中使用偏差,然后在批量归一化中再次使用,将在均值减法过程中抵消偏差。
您可以将 bias = False 放在卷积层中以忽略此冲突,因为在 pytorch

中 bias 的默认值为 True

答案已经被采纳,但我还是想在这里补充一点。 Batch Normalization 的优点之一是它可以折叠在一个卷积层中。这意味着我们可以用一个具有不同权重的卷积来代替卷积和批归一化操作。折叠批归一化是一个很好的做法,你可以参考 link 这里 Folding Batch Norm.

我也写了一些python脚本供您理解。请检查这个。

def fold_batch_norm(conv_layer, bn_layer):
"""Fold the batch normalization parameters into the weights for 
   the previous layer."""
conv_weights = conv_layer.get_weights()[0]

# Keras stores the learnable weights for a BatchNormalization layer
# as four separate arrays:
#   0 = gamma (if scale == True)
#   1 = beta (if center == True)
#   2 = moving mean
#   3 = moving variance
bn_weights = bn_layer.get_weights()
gamma = bn_weights[0]
beta = bn_weights[1]
mean = bn_weights[2]
variance = bn_weights[3]

epsilon = 1e-7
new_weights = conv_weights * gamma / np.sqrt(variance + epsilon)
param = conv_layer.get_config()

#Note that it will handle for all cases
if param['use_bias'] == True:
    bias = conv_layer.get_weights()[1]
    new_bias = beta + (bias - mean) * gamma / np.sqrt(variance + epsilon)
else:
    new_bias = beta - mean * gamma / np.sqrt(variance + epsilon)
return new_weights, new_bias

你也可以在以后的项目中考虑这个想法。干杯:)