在 Theano 中使导数为零

Question

我正在尝试从这篇论文中实现 LSTM 优化器：https://arxiv.org/pdf/1606.04474v1.pdf

他们正在做关于梯度导数的假设w.r.t。 LSTM 参数等于零：

看看我的代码，我认为当我优化损失函数时，没有使用这个假设，因为 Theano 可以计算这个梯度并且它确实这样做了。我怎样才能阻止它这样做？

代码如下：

def step_opt(cell_previous, hid_previous, theta_previous, *args):
    func = self.func(theta_previous)

    grad = theano.grad(func, theta_previous)
    input_n = grad.dimshuffle(0, 'x')

    cell, hid = step(input_n, cell_previous, hid_previous, *args) # function that recomputes LSTM hidden state and cell 

    theta = theta_previous + hid.dot(self.W_hidden_to_output).dimshuffle(0)
    return cell, hid, theta, func

cell_out, hid_out, theta_out, loss_out = theano.scan(
         fn=step_opt,
         outputs_info=[cell_init, hid_init, theta_init, None],
         non_sequences=non_seqs,
         n_steps=self.n_steps,
         strict=True)[0]

loss = loss_out.sum()

Answer 1

最终我找到了答案。有这个页面： http://deeplearning.net/software/theano/library/gradient.html

我们可以使用 disconnected_grad(expr) 使反向传播在 expr 停止。

在 Theano 中使导数为零

Make derivative zero in Theano

optimization

machine-learning

theano

lstm