如何将 grad 方法添加到 theano Op?

How to add the grad method to a theano Op?

我创建了一个 theano.Op,两个输入集合中每对之间的距离为 returns,将 scipy cdist:

class Cdist(theano.Op):

    __props__ = ()

    def __init__(self):
        #self.fn = scipy_cdist2
        super(Cdist, self).__init__()

    def make_node(self, x, w):
        return gof.Apply(self, [x, w], [x.type()])

    def perform(self, node, inputs, output_storage):
        x, w = inputs[0], inputs[1]
        z = output_storage[0]
        z[0] = distance.cdist(x, w, 'euclidean')

有效,但现在想添加 grad 方法。我已经阅读了guide and the documentation about the grad method. But i still dont't understand how it works. For example in the guide获取梯度的一种方法returna*x + b,他们使用:

def grad(self, inputs, output_grads):
    return [a * output_grads[0] + b] 

为什么?我要引用 documentation 中关于 grad:


If the output list of the op is [f_1, ... f_n], then the list output_gradients is [grad_{f_1}(C), grad_{f_2}(C), ... , grad_{f_n}(C)]. If inputs consists of the list [x_1, ..., x_m], then Op.grad should return the list [grad_{x_1}(C), grad_{x_2}(C), ..., grad_{x_m}(C)], where (grad_{y}(Z))_i = \frac{\partial Z}{\partial y_i} (and i can stand for multiple dimensions).

他们告诉我必须写渐变?但在示例中,组合了 output_grads 和整数值。真的没看懂

文档没有错。在 grad 方法中你应该写一个 symbolic 表达式,而不是在 perform 方法中你写一个 numerical 表达式。

grad 方法从 theano.grad 调用,而 perform 在编译函数内部调用。


def grad(self, inputs, out_grads):
    x, y = inputs   # matrices of shape [mA, n] and [mB, n]]
    g, = out_grads   # matrix of shape [mA, mB]
    diff = x.dimshuffle(0, 'x', 1) - y.dimshuffle('x', 0, 1)   # [mA, mB, n] tensor
    z = T.sqrt(T.sum(T.sqr(diff), axis=2, keepdims=True))
    diff = g * diff / z
    return [T.sum(diff, axis=1), -T.sum(diff, axis=0)]

对于这种特殊情况,我建议写 L_op 而不是 gradL_op 在前向操作中额外重用输出。

def L_op(self, inputs, outputs, out_grads):
    x, y = inputs   # matrices of shape [mA, n] and [mB, n]
    z, = outputs   # matrix of shape [mA, mB]
    g, = out_grads   # idem
    diff = x.dimshuffle(0, 'x', 1) - y.dimshuffle('x', 0, 1)   # [mA, mB, n] tensor
    diff = g.dimshuffle(0, 1, 'x') * diff / z.dimshuffle(0, 1, 'x')
    return [T.sum(diff, axis=1), -T.sum(diff, axis=0)]

好吧,grad 表达式可能是错误的,但你明白了。

如您所见,我们调用的是dimshuffle等符号函数。然而,有些情况下你想为 grad Op 写一个 class。要么因为符号图效率太低,要么你想要自定义梯度。


class CDistGrad(theano.Op):
    def __init__(...):
        # <...>
    def c_code(...):
        # implement this in case you want more performance
    def perform(...):
        # <...>
    def make_node(...):
        # <...>

class CDist(theano.Op):
    # <...>
    def grad(self, inputs, output_grads):
        return CDistGrad()(*inputs, *output_grads)

仍然,grad方法中使用了符号表达式。只是一个自定义 Op 替换了普通的 Theano 表达式。