为什么 PyTorch "convolutions" 实现为互相关?

Why are PyTorch "convolutions" implemented as cross-correlations?

PyTorch 卷积实际上实现为 cross-correlations. This shouldn't produce issues in training a convolution layer, since one is just a flipped version of the other (and hence the learned function will be equally powerful), but it does when:

  1. 尝试使用 functional 库实现实际的卷积
  2. 尝试从另一个深度学习库复制实际卷积的权重

作者在Deep Learning with PyTorch中说了以下内容:

Convolution, or more precisely, discrete convolution1...

1. There is a subtle difference between PyTorch's convolution and mathematics' convolution: one argument's sign is flipped. If we were in a pedantic mood, we could call PyTorch's convolutions discrete cross-correlations.

但是他们没有解释为什么它是这样实现的。有什么原因吗?

也许类似于 CrossEntropyLoss but an analogous function taking "logits" as inputs instead of raw probabilities (to avoid numerical instability) 的 PyTorch 实现方式?

我觉得原因比较简单。正如您所说,卷积是互相关的翻转版本,但在训练 CNN 的情况下这不是问题。所以我们可以避免做翻转,从而简化代码并减少计算时间:

The advantage of cross-correlation is that it avoids the additional step of flipping the filters to perform the convolutions.

翻转内核不会对数学稳定性产生任何影响。操作保持不变。