三元组损失的 softmax 版本的梯度计算

Question

我一直在尝试在
中描述的 Caffe 中实现三元组损失的 softmax 版本 Hoffer and Ailon, Deep Metric Learning Using Triplet Network, ICLR 2015.

我试过了，但我发现很难计算梯度，因为指数中的 L2 没有平方。

有人可以帮我吗？

Answer 1

这是一道数学题，但就到此为止。第一个方程是你习惯的，第二个是你不平方的时候做的。

Answer 2

使用现有的 caffe 层实施 L2 规范可以让您省去所有的麻烦。

这是在 caffe 中为 "bottom"s x1 和 x2 计算 ||x1-x2||_2 的一种方法（假设 x1 和 x2 是 B-by-C 斑点，为 C 维度差异计算 B 范数）

layer {
  name: "x1-x2"
  type: "Eltwise"
  bottom: "x1"
  bottom: "x1"
  top: "x1-x2"
  eltwise_param { 
    operation: SUM
    coeff: 1 coeff: -1
  }
}
layer {
  name: "sqr_norm"
  type: "Reduction"
  bottom: "x1-x2"
  top: "sqr_norm"
  reduction_param { operation: SUMSQ axis: 1 }
}
layer {
  name: "sqrt"
  type: "Power"
  bottom: "sqr_norm"
  top: "sqrt"
  power_param { power: 0.5 }
}

对于论文中定义的三元组损失，您需要计算 x-x+ 和 x-x- 的 L2 范数，连接这两个 blob 并将连接后的 blob 提供给 "Softmax"图层。
不需要肮脏的梯度计算。

三元组损失的 softmax 版本的梯度计算

Gradient calculation for softmax version of triplet loss

neural-network

gradient-descent

deep-learning

caffe

softmax