神经网络 - Scipy 最小化 ValueError tnc:无效梯度向量

Neural Network - Scipy minimize ValueError tnc: invalid gradient vector

我是 ML 的新手,我一直在尝试使用 python 实现神经网络,但是当我将最小化函数与 scipy 库中的 tnc 方法一起使用时,我得到以下错误:

ValueError: tnc: 无效梯度向量。

我稍微查了一下,在源代码中找到了这个


  arr_grad = (PyArrayObject *)PyArray_FROM_OTF((PyObject *)py_grad, NPY_DOUBLE, NPY_ARRAY_IN_ARRAY);

  if (arr_grad == NULL)

  {

    PyErr_SetString(PyExc_ValueError, "tnc: invalid gradient vector.");

    goto failure;

编辑:这是我对反向传播和成本函数的实现,作为我创建的网络方法 class,我目前使用的 [400 25 10] 结构类似于 Andrew Ng 的 ML Coursea 中使用的结构课程

    def cost_function(self, theta, x, y):
        u = self.num_layers
        m = len(x)
        Reg = 0                                     # Regulaization Term init and Calculation 
        for i in range(u - 1):
            k = np.power(theta[i], 2)
            Reg = np.sum(Reg + np.sum(k))
        Reg = lmbda / (2 * m) * Reg                 
        h = self.forwardprop(x)[-1]                 # Getting the activation of the last layer
        J = (-1 / m) * np.sum(np.multiply(y, np.log(h)) + np.multiply((1 - y), np.log(1 - h))) + Reg     # Cost Func
        return J

    def backprop(self, theta, x, y):
        m = len(x)                                              # number of training example
        theta = np.asmatrix(theta)                              # 
        theta = self.rollPara(theta)                            # Roll weights into Matrices, Original shape (1, 10285), after rolling [(25, 401), (26, 10)]
        tot_delta = list(range((self.num_layers-1)))            # accumulated error init
        delta =list(range(self.num_layers-1))                   # error from each example init
        for i in range(m):                                      # loop for calculating error
            a = self.forwardprop(x[i:i+1, :])                   # get activation of each layer for ith example
            delta[-1] = a[-1] - y[i]                            # error of output layer of ith example
            for j in range(1, self.num_layers-1):               # loop to calculate error of each layer for ith example
                theta_ = theta[-1-j+1][:, 1:]                   # weights of jth layer (from back to front)('-1' represents last element)(1. weights index 2.exclude bias units)
                act = (a[:-1])[-1-j+1][:, 1:]                   # activation of current layer (1. exclude output layer layer 2. activation index 3. exclude bias units)
                delta_prv = delta[-1-j+1]                             # error of previous layer
                delta[-1-j] = np.multiply(delta_prv@theta_, act)      # error of current layer
            delta = delta[::-1]                                       # reverse the order of elements since BP starts from back to front
            for j in range(self.num_layers-1):                                                       # loop to add ith example error to accumlated error
                tot_delta[j] = tot_delta[j] + np.transpose(delta[j])@a[self.num_layers-2-j]          # add jth layer error from ith example to jth layer accumulated error

        ThetaGrad = np.add((1/m)*np.asarray(tot_delta[::-1]), (lmbda/m)*np.asarray(theta))  # calculate gradient
        grad = self.unrollPara(ThetaGrad)
        return grad

maxiter=500                        
options = {'maxiter': maxiter}
initTheta = N.unrollPara(N.weights)         # flattening into vector    
res = op.minimize(fun=N.cost_function, x0=initTheta, jac=N.backprop, method='tnc', args=(x, Y), options=options)   # x, Y are training set that are already initialized 

This 是 scipy 源代码

提前致谢,

仔细阅读代码后,我意识到 grad 向量必须是列表而不是 NumPy 数组。不确定我的实现是否正常,但错误消失了