如何将 grad 方法添加到 theano Op?
How to add the grad method to a theano Op?
我创建了一个 theano.Op
,两个输入集合中每对之间的距离为 returns,将 scipy
cdist
:
class Cdist(theano.Op):
__props__ = ()
def __init__(self):
#self.fn = scipy_cdist2
super(Cdist, self).__init__()
def make_node(self, x, w):
#print('make_node')
return gof.Apply(self, [x, w], [x.type()])
def perform(self, node, inputs, output_storage):
#print('perform')
x, w = inputs[0], inputs[1]
z = output_storage[0]
z[0] = distance.cdist(x, w, 'euclidean')
有效,但现在想添加 grad 方法。我已经阅读了guide and the documentation about the grad
method. But i still dont't understand how it works. For example in the guide获取梯度的一种方法returna*x + b
,他们使用:
def grad(self, inputs, output_grads):
return [a * output_grads[0] + b]
为什么?我要引用 documentation 中关于 grad
:
的内容
If the output list of the op is [f_1, ... f_n], then the list
output_gradients is [grad_{f_1}(C), grad_{f_2}(C), ... ,
grad_{f_n}(C)]. If inputs consists of the list [x_1, ..., x_m], then
Op.grad should return the list [grad_{x_1}(C), grad_{x_2}(C), ...,
grad_{x_m}(C)], where (grad_{y}(Z))_i = \frac{\partial Z}{\partial
y_i} (and i can stand for multiple dimensions).
他们告诉我必须写渐变?但在示例中,组合了 output_grads
和整数值。真的没看懂
文档没有错。在 grad
方法中你应该写一个 symbolic 表达式,而不是在 perform
方法中你写一个 numerical 表达式。
grad
方法从 theano.grad
调用,而 perform
在编译函数内部调用。
例如,假设欧式距离:
def grad(self, inputs, out_grads):
x, y = inputs # matrices of shape [mA, n] and [mB, n]]
g, = out_grads # matrix of shape [mA, mB]
diff = x.dimshuffle(0, 'x', 1) - y.dimshuffle('x', 0, 1) # [mA, mB, n] tensor
z = T.sqrt(T.sum(T.sqr(diff), axis=2, keepdims=True))
diff = g * diff / z
return [T.sum(diff, axis=1), -T.sum(diff, axis=0)]
对于这种特殊情况,我建议写 L_op
而不是 grad
。 L_op
在前向操作中额外重用输出。
def L_op(self, inputs, outputs, out_grads):
x, y = inputs # matrices of shape [mA, n] and [mB, n]
z, = outputs # matrix of shape [mA, mB]
g, = out_grads # idem
diff = x.dimshuffle(0, 'x', 1) - y.dimshuffle('x', 0, 1) # [mA, mB, n] tensor
diff = g.dimshuffle(0, 1, 'x') * diff / z.dimshuffle(0, 1, 'x')
return [T.sum(diff, axis=1), -T.sum(diff, axis=0)]
好吧,grad 表达式可能是错误的,但你明白了。
如您所见,我们调用的是dimshuffle
等符号函数。然而,有些情况下你想为 grad Op 写一个 class。要么因为符号图效率太低,要么你想要自定义梯度。
例如:
class CDistGrad(theano.Op):
def __init__(...):
# <...>
pass
def c_code(...):
# implement this in case you want more performance
pass
def perform(...):
# <...>
pass
def make_node(...):
# <...>
pass
class CDist(theano.Op):
# <...>
def grad(self, inputs, output_grads):
return CDistGrad()(*inputs, *output_grads)
仍然,grad
方法中使用了符号表达式。只是一个自定义 Op 替换了普通的 Theano 表达式。
我创建了一个 theano.Op
,两个输入集合中每对之间的距离为 returns,将 scipy
cdist
:
class Cdist(theano.Op):
__props__ = ()
def __init__(self):
#self.fn = scipy_cdist2
super(Cdist, self).__init__()
def make_node(self, x, w):
#print('make_node')
return gof.Apply(self, [x, w], [x.type()])
def perform(self, node, inputs, output_storage):
#print('perform')
x, w = inputs[0], inputs[1]
z = output_storage[0]
z[0] = distance.cdist(x, w, 'euclidean')
有效,但现在想添加 grad 方法。我已经阅读了guide and the documentation about the grad
method. But i still dont't understand how it works. For example in the guide获取梯度的一种方法returna*x + b
,他们使用:
def grad(self, inputs, output_grads):
return [a * output_grads[0] + b]
为什么?我要引用 documentation 中关于 grad
:
If the output list of the op is [f_1, ... f_n], then the list output_gradients is [grad_{f_1}(C), grad_{f_2}(C), ... , grad_{f_n}(C)]. If inputs consists of the list [x_1, ..., x_m], then Op.grad should return the list [grad_{x_1}(C), grad_{x_2}(C), ..., grad_{x_m}(C)], where (grad_{y}(Z))_i = \frac{\partial Z}{\partial y_i} (and i can stand for multiple dimensions).
他们告诉我必须写渐变?但在示例中,组合了 output_grads
和整数值。真的没看懂
文档没有错。在 grad
方法中你应该写一个 symbolic 表达式,而不是在 perform
方法中你写一个 numerical 表达式。
grad
方法从 theano.grad
调用,而 perform
在编译函数内部调用。
例如,假设欧式距离:
def grad(self, inputs, out_grads):
x, y = inputs # matrices of shape [mA, n] and [mB, n]]
g, = out_grads # matrix of shape [mA, mB]
diff = x.dimshuffle(0, 'x', 1) - y.dimshuffle('x', 0, 1) # [mA, mB, n] tensor
z = T.sqrt(T.sum(T.sqr(diff), axis=2, keepdims=True))
diff = g * diff / z
return [T.sum(diff, axis=1), -T.sum(diff, axis=0)]
对于这种特殊情况,我建议写 L_op
而不是 grad
。 L_op
在前向操作中额外重用输出。
def L_op(self, inputs, outputs, out_grads):
x, y = inputs # matrices of shape [mA, n] and [mB, n]
z, = outputs # matrix of shape [mA, mB]
g, = out_grads # idem
diff = x.dimshuffle(0, 'x', 1) - y.dimshuffle('x', 0, 1) # [mA, mB, n] tensor
diff = g.dimshuffle(0, 1, 'x') * diff / z.dimshuffle(0, 1, 'x')
return [T.sum(diff, axis=1), -T.sum(diff, axis=0)]
好吧,grad 表达式可能是错误的,但你明白了。
如您所见,我们调用的是dimshuffle
等符号函数。然而,有些情况下你想为 grad Op 写一个 class。要么因为符号图效率太低,要么你想要自定义梯度。
例如:
class CDistGrad(theano.Op):
def __init__(...):
# <...>
pass
def c_code(...):
# implement this in case you want more performance
pass
def perform(...):
# <...>
pass
def make_node(...):
# <...>
pass
class CDist(theano.Op):
# <...>
def grad(self, inputs, output_grads):
return CDistGrad()(*inputs, *output_grads)
仍然,grad
方法中使用了符号表达式。只是一个自定义 Op 替换了普通的 Theano 表达式。