如何决定 'kaiming_normal' 初始化使用哪种模式

Question

我已经阅读了几个使用 PyTorch 的 nn.init.kaiming_normal_() 进行层初始化的代码。一些代码使用默认的 fan in 模式。在众多示例中，可以找到一个 here 并显示如下。

init.kaiming_normal(m.weight.data, a=0, mode='fan_in')

但是，有时我看到人们使用 fan out 模式，如下所示 here。

if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')

有人可以给我一些指导或提示来帮助我决定使用哪种模式 select 吗？此外，我正在使用 PyTorch 处理图像超分辨率和去噪任务，以及哪种模式更有益。

Answer 1

根据documentation：

Choosing 'fan_in' preserves the magnitude of the variance of the weights in the forward pass. Choosing 'fan_out' preserves the magnitudes in the backwards pass.

并根据 Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015)：

We note that it is sufficient to use either Eqn.(14) or Eqn.(10)

其中Eqn.(10)和Eqn.(14)是fan_in和fan_out适当。此外：

This means that if the initialization properly scales the backward signal, then this is also the case for the forward signal; and vice versa. For all models in this paper, both forms can make them converge

所以总而言之，这并不重要，但更重要的是你所追求的。我假设，如果您怀疑您的向后传递可能更多 "chaotic"（方差更大），则值得将模式更改为 fan_out。这可能会在损失波动很大时发生（例如，非常简单的例子后面跟着非常困难的例子）。

正确选择nonlinearity更为重要，其中nonlinearity是您正在使用的激活在您当前正在初始化的图层之后。当前默认设置为 leaky_relu 和 a=0，实际上与 relu 相同。如果您正在使用 leaky_relu，您应该将 a 更改为它的斜率。

如何决定 'kaiming_normal' 初始化使用哪种模式

How to decide which mode to use for 'kaiming_normal' initialization

initialization

pytorch