[Theano]如何基于共享变量求梯度

Question

我目前面临这个问题：我无法在用 Theano 编码的循环神经网络中评估我的梯度符号变量。这是代码：

  W_x = theano.shared(init_W_x, name='W_x')
  W_h = theano.shared(init_W_h, name='W_h')
  W_y = theano.shared(init_W_y, name='W_y')
  [self.y, self.h], _ = theano.scan(self.step,
                                    sequences=self.x,
                                    outputs_info=[None, self.h0])

  error = ((self.y - self.t) ** 2).sum()

  gW_x, gW_y, gW_h = T.grad(self.error, [W_x, W_h, W_y])

  [...]

  def step(self, x_t, h_tm1):
      h_t = T.nnet.sigmoid(T.dot(self.W_x, x_t) + T.dot(h_tm1, self.W_h))
      y_t = T.dot(self.W_y, h_t)
      return y_t, h_t

我只保留了我认为合适的东西。
我希望能够计算例如 'gW_x' 但是当我尝试将它嵌入为 theano 函数时它不起作用因为它是依赖项 (W_x, W_h, W_y) 是共享变量。

非常感谢

Answer 1

我认为在这种情况下，您需要将共享变量传递给 theano.scan 的 non_sequences 参数中的函数 self.step。

因此您需要更改self.step的签名以增加三个参数，对应于共享变量，然后将参数non_sequences=[W_x, W_h, W_y]添加到theano.scan。

此外，我怀疑您可能在倒数第二行打错了字 - 应该是 error = ((self.y - t) ** 2).sum() 吗？

[Theano]如何基于共享变量求梯度

[Theano]How to evaluate gradient based on shared variables

machine-learning

theano