Pyfaster RCNN ZF 网络模型中的 Softmax 输入维度

Question

我有兴趣在 zf net 的 prototxt 文件中重现这些步骤。我不确定的部分是 softmax 层。 rpn_cls_score 是用维度 (1,18,h,w) 创建的：

layer {
  name: "rpn_cls_score"
  type: "Convolution"
  bottom: "rpn/output"
  top: "rpn_cls_score"
  convolution_param {
    num_output: 18   # 2(bg/fg) * 9(anchors)
    kernel_size: 1 pad: 0 stride: 1
    weight_filler { type: "gaussian" std: 0.01 }
    bias_filler { type: "constant" value: 0 }
  }
}

然后在此处将其重塑为尺寸 (1,2,9*h,w)：

layer {
   bottom: "rpn_cls_score"
   top: "rpn_cls_score_reshape"
   name: "rpn_cls_score_reshape"
   type: "Reshape"
   reshape_param { shape { dim: 0 dim: 2 dim: -1 dim: 0 } }
}

最后传给softmax：

layer {
  name: "rpn_cls_prob"
  type: "Softmax"
  bottom: "rpn_cls_score_reshape"
  top: "rpn_cls_prob"
}

我的问题是这样的。根据 caffe 在线文档，softmax 采用一维输入，但 rpn_cls_score_reshape 具有维度 (1,2,9*h,w)。 softmax 是对所有索引求和，还是 select 一个规范轴并仅对其余索引求和（正如 C++ 代码似乎表明的那样）？在这种情况下，这意味着它将 rpn_cls_score_reshape 分成两个数组，(1,channel=1,9*h,w) 和 (1,channel=2,9*h,w)，一个用于第二个索引的每个值，在每个索引中，通过对 9*h*w 分量的指数求和来执行 softmax，然后将它们重新组合成一个具有原始维度 (1,2,9*h,w) 和 returns 这就是 rpn_cls_prob。如果不是，softmax如何处理多维输入数组？

谢谢..

Answer 1

由于SofmaxParameter在caffe.proto中有记载，它有一个默认设置为1的参数轴：

// The axis along which to perform the softmax -- may be negative to index
// from the end (e.g., -1 for the last axis).
// Any other axes will be evaluated as independent softmaxes.
optional int32 axis = 2 [default = 1];

所以你对 C++ 实现的理解是正确的，对于 softmax 如何处理 N > 1 的 ND 输入的问题是每个轴是单独评估的。
至于更快的 RCNN，如果你只对前景框感兴趣，你可以只分割 rpn_cls_score blob 并只使用它的后半部分（即在训练你的网络集 num_output: 9 # instead of 18 之后，或者在训练期间使用Slice层只取下半场）。注意相应地更改caffemodel，以防您像往常一样训练并在训练后更改num_output。

Pyfaster RCNN ZF 网络模型中的 Softmax 输入维度

Softmax input dimension in the Pyfaster RCNN ZF network model

caffe

softmax