神经网络训练问题：一次迭代多次使用同一个模块的梯度

question of neural network training：the gradient of the same module which is used multiple times in one iteration

训练神经网络时，如果在一次迭代中多次使用同一个模块，反向传播时模块的梯度是否需要特殊处理？

例如：

一个Deformable Compensation在这个模型中使用了3次，这意味着它们共享相同的权重。

当我使用 loss.backward() 时会发生什么？ loss.backward()能正常工作吗？

autograd and backward passes is that the underlying framework is not "algorithmic", but rather a mathematic one: it implements the chain rule of derivatives 的好处。因此，没有“共享权重”或“加权不同层”的“算法”考虑，它是纯数学。向后传递以纯数学方式提供损失函数的导数 w.r.t 权重。

可以在全球范围内共享权重（例如，在训练时 Saimese networks), on a "layer level" (as in your example), but also within a layer. When you think about it Convolution layers and Reccurent layers 是一种在本地共享权重的奇特方式。

当然，pytorch（以及所有其他 DL 框架）可以轻松处理这些情况。
只要您的“可变形补偿”层得到正确实施——由于链式法则，pytorch 将以数学上正确的方式为您处理梯度。

神经网络训练问题：一次迭代多次使用同一个模块的梯度

question of neural network training：the gradient of the same module which is used multiple times in one iteration

machine-learning

deep-learning

pytorch

autograd