Theano - 具有 1 个隐藏层的 SGD

Theano - SGD with 1 hidden layer

我使用 Newmu tutorial 从他的 github 进行逻辑回归。想在他的模型中添加一个隐藏层,所以我将 weights 变量分为两个数组 h_w 和 o_w。 问题是 - 当我尝试更新时无法在列表上操作 (w = [h_w, o_w])

 "File "C:/Users/Dis/PycharmProjects/untitled/MNISTnet.py",
 line 32, in <module>
     **update = [[w, w - gradient * 0.05]] TypeError: can't multiply sequence by non-int of type 'float'**"

我是 theano 和 numpy 的初学者,theano 文档对我没有帮助。我找到了 stack() 函数,但是在组合 w = T.stack([h_w, o_w], axis=1) theano 时出现错误:

Traceback (most recent call last):
  File "C:\Users\Dis\PycharmProjects\untitled\MNISTnet.py", line 35, in <module>
    gradient = T.grad(cost=cost, wrt=w)
  File "C:\Program Files\Anaconda2\lib\site-packages\theano-0.9.0.dev1-py2.7.egg\theano\gradient.py", line 533, in grad
    handle_disconnected(elem)
  File "C:\Program Files\Anaconda2\lib\site-packages\theano-0.9.0.dev1-py2.7.egg\theano\gradient.py", line 520, in handle_disconnected
    raise DisconnectedInputError(message)
theano.gradient.DisconnectedInputError:  
Backtrace when that variable is created:

  File "C:\Users\Dis\PycharmProjects\untitled\MNISTnet.py", line 30, in <module>
    w = T.stack([h_w, o_w], axis=1)

那么,我的问题是:如何将列表 [<TensorType(float64, matrix)>, <TensorType(float64, matrix)>] 转换为变量 <TensorType(float64, matrix)>

我的完整代码如下:

import theano
from theano import tensor as T
import numpy as np
from load import mnist

def floatX(X):
    return np.asarray(X, dtype=theano.config.floatX)

def init_weights(shape):
    return theano.shared(floatX(np.random.randn(*shape) * 0.01))

def model(X, o_w, h_w):
    hid = T.nnet.sigmoid(T.dot(X, h_w))
    out = T.nnet.softmax(T.dot(hid, o_w))
    return out

trX, teX, trY, teY = mnist(onehot=True)

X = T.fmatrix()
Y = T.fmatrix()

h_w = init_weights((784, 625))
o_w = init_weights((625, 10))

py_x = model(X, o_w, h_w)
y_pred = T.argmax(py_x, axis=1)
w = [o_w, h_w]

cost = T.mean(T.nnet.categorical_crossentropy(py_x, Y))
gradient = T.grad(cost=cost, wrt=w)
print type(gradient)
update = [[w, w - gradient * 0.05]]

T.grad(..)returnsgradient w.r.t to each parameter, so you cannot do [w, w - gradient * 0.05], you have to specify which gradient[*] parameter you are referring to. Also it's not a good idea to use stack for multiple parameters, simple list is good enough, check this tutorial。 这应该有效:

import theano
from theano import tensor as T
import numpy as np
from load import mnist

def floatX(X):
    return np.asarray(X, dtype=theano.config.floatX)

def init_weights(shape):
    return theano.shared(floatX(np.random.randn(*shape) * 0.01))

def model(X, o_w, h_w):
    hid = T.nnet.sigmoid(T.dot(X, h_w))
    out = T.nnet.softmax(T.dot(hid, o_w))
    return out

trX, teX, trY, teY = mnist(onehot=True)

X = T.fmatrix()
Y = T.fmatrix()

h_w = init_weights((784, 625))
o_w = init_weights((625, 10))

py_x = model(X, o_w, h_w)
y_pred = T.argmax(py_x, axis=1)
w = [o_w, h_w]

cost = T.mean(T.nnet.categorical_crossentropy(py_x, Y))
gradient = T.grad(cost=cost, wrt=w)
print type(gradient)
update = [[o_w, o_w - gradient[0] * 0.05],
          [h_w, h_w - gradient[1] * 0.05]]

我建议通过 Theano tutorials 开始。