如何解读TensorFlow的卷积滤波器和stridding参数？

Question

我正在尝试理解 TensorFlow 的 convolution，尤其是公式

shape(output) = [batch,
             (in_height - filter_height + 1) / strides[1],
             (in_width - filter_width + 1) / strides[2],
             ...]

我原以为公式是

shape(output) = [batch,
             (in_height - filter_height) / strides[1] + 1,
             (in_width - filter_width) / strides[2] + 1,
             ...]

相反。从 32x32 的图像开始，应用步幅为 [1,3,3,1] 的 5x5 过滤器，然后在我的理解中这应该产生 10x10 的输出，其值是区域

的卷积

 (0:4,0:4) ,  (0:4,3:7) ,  (0:4,6:10) , ...,  (0:4,27:31), 
 (3:7,0:4) ,  (3:7,3:7) ,  (3:7,6:10) , ...,  (3:7,27:31),
...
(27:31,0:4), (27:31,3:7), (27:31,6:10), ..., (27:31,27:31)

所以两个维度都应该是 floor((32-5)/3)+1=10 而不是 floor((32-5+1)/3)=9。我在这里错过了什么？我是不是误解了这里做卷积的方式 and/or 参数是什么意思？如果是这样，我应该使用什么参数来获得上述选择？

Answer 1

你是对的 - 应该是：

ceil(float(in_height - filter_height + 1) / float(strides[1]))

对于 32、5、stride=3，这变成：ceil(9.33) = 10。

已修复并将很快推入 github。感谢您抓住这个！有关详细信息，请参阅 github bug discussion, issue #196

Answer 2

根据issue #196，这部分文档显然是错误的；我认为 dga 的回答仍然存在问题。

应该是：

楼层((in_height+y_padding-filter_height)/y_stride) + 1,

当padding=VALID时，y_padding=0.
当padding=SAME时，一般来说y_padding应该调整成(in_height+y_padding-filter_height)/y_stride一个整数，这样'floor'变得不必要了。

如何解读TensorFlow的卷积滤波器和stridding参数？

How to interpret TensorFlow's convolution filter and striding parameters?

python

filter

convolution

tensorflow