caffe: group 参数是什么意思？

Question

我已阅读有关 group 参数的文档：

group (g) [default 1]: If g > 1, we restrict the connectivity of each filter to a subset of the input. Specifically, the input and output channels are separated into g groups, and the ith output group channels will be only connected to the ith input group channels.

但首先我不明白他们的意思。其次，我为什么要使用它。谁能帮我更好地解释一下？

据我理解，意思如下：

如果我将 g 设置为大于 1，我的输入和输出通道将被分成几组。但究竟是怎么做到的呢？如果我将它设置为 20 而我的输入是 40，我将不得不以 20 为一组？如果输出是 50，我将有一组 20 个和一组 30 个？

Answer 1

参数给出的是组的数量，而不是大小。如果您有 40 个输入并将 g 设置为 20，您将获得 20 "lanes" 个通道，每个通道为 2 个；有 50 个输出，你会得到 10 组 2 和 10 组 3.

更常见的是，您分成少数几个组，例如 2 个。在这种情况下，您将有两个处理 "lanes" 或组。对于您提到的 40=>50 层，每个组将有 20 个输入和 25 个输出。每层都会分成两半，每组前向和后向传播只在自己的一半内工作，对于 group 参数适用的层范围（我认为是一路到最后一层）。

处理优势在于，您有 2 组 20^2 个连接，而不是 40^2 个输入连接，或一半。这使处理速度提高了大约 2 倍，收敛过程中的损失非常小。

Answer 2

And secondly, why would I use [grouping]?

这最初是在引发当前神经网络流行周期的论文中作为优化提出的：

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." In Advances in neural information processing systems, pp. 1097-1105. 2012.

图 2 显示了分组是如何用于该工作的。 caffe 的作者最初添加此功能是为了复制 AlexNet 架构。然而，分组继续显示出在其他情况下的好处。

例如，Facebook 和 Google 都发布了论文，这些论文基本上表明分组可以显着减少资源使用，同时有助于保持准确性。 Facebook论文可以在这里看到：(ResNeXt) and the Google paper can be found here: (MobileNets)

Answer 3

首先，Caffe 仅确定行为，而group是input_channel和output_channel。我们可以从 source code:

中确认这一点

CHECK_EQ(channels_ % group_, 0);
CHECK_EQ(num_output_ % group_, 0)
  << "Number of output should be multiples of group.";

其次，参数group与filter参数个数有关，具体来说，与filter的通道大小有关。每个过滤器的实际数量是input_channel/group。这也可以从 source code:

得到证实

vector<int> weight_shape(2);
weight_shape[0] = conv_out_channels_;
weight_shape[1] = conv_in_channels_ / group_;

这里注意weight_shape[0]是filer的个数

所以，w.r.t你的问题：

在Caffe中，如果input_channel是40，group是20：

output_channel 可能不是 50。
如果output_channel为20（记住这意味着你有20个过滤器），每2个输入通道负责一个输出通道。例如，第0个输出通道是从第0个和第1个输入通道计算出来的，与其他输入通道没有关系。
如果output_channel等于input_channel（即output_channel=40），这其实就是大家熟知的depthwise convolution.每个输出通道仅从一个不同的输入通道计算得出。

w.r.t反卷积：

我们几乎总是设置group = output_channels。这是 official doc:

中 Deconvolution 层的建议配置

layer {
  name: "upsample", type: "Deconvolution"
  bottom: "{{bottom_name}}" top: "{{top_name}}"
  convolution_param {
    kernel_size: {{2 * factor - factor % 2}} stride: {{factor}}
    num_output: {{C}} group: {{C}}
    pad: {{ceil((factor - 1) / 2.)}}
    weight_filler: { type: "bilinear" } bias_term: false
  }
  param { lr_mult: 0 decay_mult: 0 }
}

使用以下说明：

By specifying num_output: {{C}} group: {{C}}, it behaves as channel-wise convolution. The filter shape of this deconvolution layer will be (C, 1, K, K) where K is kernel_size, and this filler will set a (K, K) interpolation kernel for every channel of the filter identically. The resulting shape of the top feature map will be (B, C, factor * H, factor * W). Note that the learning rate and the weight decay are set to 0 in order to keep coefficient values of bilinear interpolation unchanged during training.

caffe: group 参数是什么意思？

caffe: What does the group param mean?

deep-learning

caffe

conv-neural-network

所以，w.r.t你的问题：

w.r.t反卷积：

caffe: **group** 参数是什么意思？

caffe: What does the **group** param mean?

deep-learning

caffe

conv-neural-network

所以，w.r.t你的问题：

w.r.t反卷积：

caffe: group 参数是什么意思？

caffe: What does the group param mean?