梯度下降感知器精度差

Question

我想从头开始学习神经网络。这意味着开始玩弄感知器。目前我正在尝试实现批量梯度下降。我遵循的指南提供了以下伪代码：

我已经尝试使用一些虚拟数据实现如下所示，但发现它不是特别准确。它收敛到我认为的一些局部最小值。

我的问题是：
我有什么方法可以检查这实际上是局部最小值，我一直在研究如何绘制它，但我不确定如何真正去做这件事。除此之外，有没有办法使用梯度下降来获得更准确的结果？或者我是否必须使用更复杂的方法，或者可能运行从不同的随机权重开始多次尝试找到全局最小值？

我在发帖前查看了论坛，但没有找到太多信息让我对我在这里所做的事情充满信心，或者正在发生的事情实际上是正确的，所以任何帮助都会很棒。

import pandas as pd
import numpy as np
import random
import math


def main():

    learningRate = 0.1
    np.random.seed(1)

    trainingInput = np.asmatrix([
              [1, -1],
              [2, 1],
              [1.5, 0.5],
              [2, -1],
              [1, 2]
            ])

    biasAccount = np.ones((5,1))
    trainingInput = np.append(biasAccount, trainingInput, axis=1)
    trainingOutput = np.asmatrix([
                [0],
                [1],
                [0],
                [0],
                [1]
            ])



    weights = 1 * np.random.random((3,1))-1

    for iteration in range(10000):
        prediction = np.dot(trainingInput, weights)

        print("Weights: \n" + str(weights))

        print("Prediction: \n" + str(prediction))

        error = trainingOutput - prediction

        print("Error: \n" + str(error))

        intermediateResult = np.dot(error.T, trainingInput)
        delta = np.dot(learningRate, intermediateResult)

        print("Delta: \n" + str(delta))

        weights += delta.T


main()

Answer 1

不能保证您会找到全局最小值。通常，人们会进行多次跑步并选择最好的一次。高级方法包括衰减学习率、使用自适应学习率（例如使用 RMSProp 或 Adam）或使用具有动量的 GD。

有多种监控收敛的方法：

使用损失（提示：(t -Xw)X 是导数），检查小值或小变化。
Early stopping：检查（保留的）验证集的错误是否减少，如果没有，停止训练。
（可能的话，你甚至可以在连续的步骤中检查权重之间的距离，看看是否有任何变化。）

梯度下降感知器精度差

Poor Accuracy of Gradient Descent Perceptron

python

neural-network

gradient-descent