神经网络似乎在每次执行时都停留在单个输出上

Neural Network seems to be getting stuck on a single output with each execution

我创建了一个神经网络来估计输入 xsin(x) 函数。该网络有 21 个输出神经元(代表数字 -1.0、-0.9、...、0.9、1.0),带有不学习的 numpy,因为我认为我在定义前馈机制时错误地实现了神经元架构。

当我执行代码时,它正确估计的测试数据量约为 48/1000。如果您在 21 个类别之间拆分 1000 个测试数据点,这恰好是每个类别的平均数据点数。查看网络输出,您可以看到网络似乎刚刚开始为每个输入选择一个输出值。例如,它可能会选择 -0.5 作为 y 的估计值,而不管您给它的 x。我哪里出错了?这是我的第一个网络。谢谢!

import random
import numpy as np
import math
class Network(object):

def __init__(self,inputLayerSize,hiddenLayerSize,outputLayerSize):

    #Create weight vector arrays to represent each layer size and initialize indices randomly on a Gaussian distribution.
    self.layer1 = np.random.randn(hiddenLayerSize,inputLayerSize)
    self.layer1_activations = np.zeros((hiddenLayerSize, 1))
    self.layer2 = np.random.randn(outputLayerSize,hiddenLayerSize)
    self.layer2_activations = np.zeros((outputLayerSize, 1))

    self.outputLayerSize = outputLayerSize
    self.inputLayerSize = inputLayerSize
    self.hiddenLayerSize = hiddenLayerSize

    # print(self.layer1)
    # print()
    # print(self.layer2)

    # self.weights = [np.random.randn(y,x)
    #                 for x, y in zip(sizes[:-1], sizes[1:])]

def feedforward(self, network_input):

    #Propogate forward through network as if doing this by hand.
    #first layer's output activations:
    for neuron in range(self.hiddenLayerSize):
        self.layer1_activations[neuron] = 1/(1+np.exp(network_input * self.layer1[neuron]))

    #second layer's output activations use layer1's activations as input:
    for neuron in range(self.outputLayerSize):
        for weight in range(self.hiddenLayerSize):
            self.layer2_activations[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
        self.layer2_activations[neuron] = 1/(1+np.exp(self.layer2_activations[neuron]))


    #convert layer 2 activation numbers to a single output. The neuron (weight vector) with highest activation will be output.
    outputs = [x / 10 for x in range(-int((self.outputLayerSize/2)), int((self.outputLayerSize/2))+1, 1)] #range(-10, 11, 1)

    return(outputs[np.argmax(self.layer2_activations)])

def train(self, training_pairs, epochs, minibatchsize, learn_rate):
    #apply gradient descent
    test_data = build_sinx_data(1000)
    for epoch in range(epochs):
        random.shuffle(training_pairs)
        minibatches = [training_pairs[k:k + minibatchsize] for k in range(0, len(training_pairs), minibatchsize)]
        for minibatch in minibatches:
            loss = 0 #calculate loss for each minibatch

            #Begin training
            for x, y in minibatch:
                network_output = self.feedforward(x)
                loss += (network_output - y) ** 2
                #adjust weights by abs(loss)*sigmoid(network_output)*(1-sigmoid(network_output)*learn_rate
            loss /= (2*len(minibatch))
            adjustWeights = loss*(1/(1+np.exp(-network_output)))*(1-(1/(1+np.exp(-network_output))))*learn_rate
            self.layer1 += adjustWeights
            #print(adjustWeights)
            self.layer2 += adjustWeights
            #when line 63 placed here, results did not improve during minibatch.
        print("Epoch {0}: {1}/{2} correct".format(epoch, self.evaluate(test_data), len(test_data)))
    print("Training Complete")

def evaluate(self, test_data):
    """
    Returns number of test inputs which network evaluates correctly.
    The ouput assumed to be neuron in output layer with highest activation
    :param test_data: test data set identical in form to train data set.
    :return: integer sum
    """
    correct = 0
    for x, y in test_data:
        output = self.feedforward(x)
        if output == y:
            correct+=1
    return(correct)

def build_sinx_data(data_points):
"""
Creates a list of tuples (x value, expected y value) for Sin(x) function.
:param data_points: number of desired data points
:return: list of tuples (x value, expected y value
"""
x_vals = []
y_vals = []
for i in range(data_points):
    #parameter of randint signifies range of x values to be used*10
    x_vals.append(random.randint(-2000,2000)/10)
    y_vals.append(round(math.sin(x_vals[i]),1))
return (list(zip(x_vals,y_vals)))
# training_pairs, epochs, minibatchsize, learn_rate

sinx_test = Network(1,21,21)
print(sinx_test.feedforward(10))
sinx_test.train(build_sinx_data(600),20,10,2)
print(sinx_test.feedforward(10))

我没有彻底检查你的所有代码,但有些问题是显而易见的:

  • *运算符doesn't perform matrix multiplication在numpy中,你必须使用numpy.dot。例如,这会影响这些行:network_input * self.layer1[neuron]self.layer1_activations[weight]*self.layer2[neuron][weight]

  • 似乎您是通过分类解决问题(从 21 类 中选择 1),但使用的是 L2 损失。这有点混乱。您有两个选择:要么坚持分类并使用 ,要么使用 L2 损失执行回归(即预测数值)。

  • 你绝对应该提取 sigmoid 函数以避免再次编写相同的表达式:

    def sigmoid(z):
      return 1 / (1 + np.exp(-z))
    
    def sigmoid_derivative(x):
      return sigmoid(x) * (1 - sigmoid(x))
    
  • 你对self.layer1self.layer2进行了相同的更新,这显然是错误的。花点时间分析 how exactly backpropagation 作品。

我编辑了如何将我的损失函数集成到我的函数中,并且还正确地实现了梯度下降。我还删除了小批量的使用并简化了我的网络试图做的事情。我现在有一个网络试图将某些东西分类为偶数或奇数。

我用来解决问题的一些非常有用的指南:

神经网络和深度学习 的第 1 章和第 2 章,作者 Michael Nielsen,可在 http://neuralnetworksanddeeplearning.com/chap1.html 免费获取。这本书对神经网络的工作原理进行了详尽的解释,包括其执行背后的数学分解。

从头开始反向传播,作者:Erik Hallström,由 Maxim 链接。 https://medium.com/@erikhallstrm/backpropagation-from-the-beginning-77356edf427d .不像上面的指南那么彻底,但我同时打开了这两个指南,因为本指南更重要的是关于什么是重要的以及如何应用 Nielsen's 中详尽解释的数学公式书。

如何用 9 行 Python 代码构建一个简单的神经网络 https://medium.com/technology-invention-and-more/how-to-build-a-simple-neural-network-in-9-lines-of-python-code-cc8f23647ca1 .对一些神经网络基础知识的有用且快速的介绍。

这是我的(现在正在运行的)代码:

import random
import numpy as np
import scipy
import math
class Network(object):

    def __init__(self,inputLayerSize,hiddenLayerSize,outputLayerSize):

        #Layers represented both by their weights array and activation and inputsums vectors.
        self.layer1 = np.random.randn(hiddenLayerSize,inputLayerSize)
        self.layer2 = np.random.randn(outputLayerSize,hiddenLayerSize)

        self.layer1_activations = np.zeros((hiddenLayerSize, 1))
        self.layer2_activations = np.zeros((outputLayerSize, 1))

        self.layer1_inputsums = np.zeros((hiddenLayerSize, 1))
        self.layer2_inputsums = np.zeros((outputLayerSize, 1))

        self.layer1_errorsignals = np.zeros((hiddenLayerSize, 1))
        self.layer2_errorsignals = np.zeros((outputLayerSize, 1))

        self.layer1_deltaw = np.zeros((hiddenLayerSize, inputLayerSize))
        self.layer2_deltaw = np.zeros((outputLayerSize, hiddenLayerSize))

        self.outputLayerSize = outputLayerSize
        self.inputLayerSize = inputLayerSize
        self.hiddenLayerSize = hiddenLayerSize
        print()
        print(self.layer1)
        print()
        print(self.layer2)
        print()
        # self.weights = [np.random.randn(y,x)
        #                 for x, y in zip(sizes[:-1], sizes[1:])]

    def feedforward(self, network_input):
        #Calculate inputsum and and activations for each neuron in the first layer
        for neuron in range(self.hiddenLayerSize):
            self.layer1_inputsums[neuron] = network_input * self.layer1[neuron]
            self.layer1_activations[neuron] = self.sigmoid(self.layer1_inputsums[neuron])

        # Calculate inputsum and and activations for each neuron in the second layer. Notice that each neuron in the second layer represented by
        # weights vector, consisting of all weights leading out of the kth neuron in (l-1) layer to the jth neuron in layer l.
        self.layer2_inputsums = np.zeros((self.outputLayerSize, 1))
        for neuron in range(self.outputLayerSize):
            for weight in range(self.hiddenLayerSize):
                self.layer2_inputsums[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
            self.layer2_activations[neuron] = self.sigmoid(self.layer2_inputsums[neuron])

        return self.layer2_activations

    def interpreted_output(self, network_input):
        #convert layer 2 activation numbers to a single output. The neuron (weight vector) with highest activation will be output.
        self.feedforward(network_input)
        outputs = [x / 10 for x in range(-int((self.outputLayerSize/2)), int((self.outputLayerSize/2))+1, 1)] #range(-10, 11, 1)
        return(outputs[np.argmax(self.layer2_activations)])

    # def build_expected_output(self, training_data):
    #     #Views expected output number y for each x to generate an expected output vector from the network
    #     index=0
    #     for pair in training_data:
    #         expected_output_vector = np.zeros((self.outputLayerSize,1))
    #         x = training_data[0]
    #         y = training_data[1]
    #         for i in range(-int((self.outputLayerSize / 2)), int((self.outputLayerSize / 2)) + 1, 1):
    #             if y == i / 10:
    #                 expected_output_vector[i] = 1
    #                 #expect the target category to be a 1.
    #                 break
    #         training_data[index][1] = expected_output_vector
    #         index+=1
    #     return training_data

    def train(self, training_data, learn_rate):
        self.backpropagate(training_data, learn_rate)

    def backpropagate(self, train_data, learn_rate):
        #Perform for each x,y pair.
        for datapair in range(len(train_data)):
            x = train_data[datapair][0]
            y = train_data[datapair][1]
            self.feedforward(x)
           # print("l2a " + str(self.layer2_activations))
           # print("l1a " + str(self.layer1_activations))
           # print("l2 " + str(self.layer2))
           # print("l1 " + str(self.layer1))
            for neuron in range(self.outputLayerSize):
                #Calculate first error equation for error signals of output layer neurons
                self.layer2_errorsignals[neuron] = (self.layer2_activations[neuron] - y[neuron]) * self.sigmoid_prime(self.layer2_inputsums[neuron])


            #Use recursive formula to calculate error signals of hidden layer neurons
            self.layer1_errorsignals = np.multiply(np.array(np.matrix(self.layer2.T) * np.matrix(self.layer2_errorsignals)) , self.sigmoid_prime(self.layer1_inputsums))
            #print(self.layer1_errorsignals)
            # for neuron in range(self.hiddenLayerSize):
            #     #Use recursive formula to calculate error signals of hidden layer neurons
            #     self.layer1_errorsignals[neuron] = np.multiply(self.layer2[neuron].T,self.layer2_errorsignals[neuron]) * self.sigmoid_prime(self.layer1_inputsums[neuron])

            #Partial derivative of C with respect to weight for connection from kth neuron in (l-1)th layer to jth neuron in lth layer is
            #(jth error signal in lth layer) * (kth activation in (l-1)th layer.)
            #Update all weights for network at each iteration of a training pair.

            #Update weights in second layer
            for neuron in range(self.outputLayerSize):
                for weight in range(self.hiddenLayerSize):
                    self.layer2_deltaw[neuron][weight] = self.layer2_errorsignals[neuron]*self.layer1_activations[weight]*(-learn_rate)

            self.layer2 += self.layer2_deltaw

            #Update weights in first layer
            for neuron in range(self.hiddenLayerSize):
                self.layer1_deltaw[neuron] = self.layer1_errorsignals[neuron]*(x)*(-learn_rate)

            self.layer1 += self.layer1_deltaw
            #Comment/Uncomment to enable error evaluation.
            #print("Epoch {0}: Error: {1}".format(datapair, self.evaluate(test_data)))
            # print("l2a " + str(self.layer2_activations))
            # print("l1a " + str(self.layer1_activations))
            # print("l1 " + str(self.layer1))
            # print("l2 " + str(self.layer2))



    def evaluate(self, test_data):
        error = 0
        for x, y in test_data:
            #x is integer, y is single element np.array
            output = self.feedforward(x)
            error += y - output
        return error


#eval function for sin(x)
    # def evaluate(self, test_data):
    #     """
    #     Returns number of test inputs which network evaluates correctly.
    #     The ouput assumed to be neuron in output layer with highest activation
    #     :param test_data: test data set identical in form to train data set.
    #     :return: integer sum
    #     """
    #     correct = 0
    #     for x, y in test_data:
    #         outputs = [x / 10 for x in range(-int((self.outputLayerSize / 2)), int((self.outputLayerSize / 2)) + 1,
    #                                          1)]  # range(-10, 11, 1)
    #         newy = outputs[np.argmax(y)]
    #         output = self.interpreted_output(x)
    #         #print("output: " + str(output))
    #         if output == newy:
    #             correct+=1
    #     return(correct)

    def sigmoid(self, z):
        return 1 / (1 + np.exp(-z))

    def sigmoid_prime(self, z):
        return (1 - self.sigmoid(z)) * self.sigmoid(z)

def build_simple_data(data_points):
    x_vals = []
    y_vals = []
    for each in range(data_points):
        x = random.randint(-3,3)
        expected_output_vector = np.zeros((1, 1))
        if x > 0:
            expected_output_vector[[0]] = 1
        else:
            expected_output_vector[[0]] = 0

        x_vals.append(x)
        y_vals.append(expected_output_vector)
    print(list(zip(x_vals,y_vals)))
    print()
    return (list(zip(x_vals,y_vals)))


simpleNet = Network(1, 3, 1)
# print("Pretest")
# print(simpleNet.feedforward(-3))
# print(simpleNet.feedforward(10))
# init_weights_l1 = simpleNet.layer1
# init_weights_l2 = simpleNet.layer2
# simpleNet.train(build_simple_data(10000),.1)
# #sometimes Error converges to 0, sometimes error converges to 10.
# print("Initial Weights:")
# print(init_weights_l1)
# print(init_weights_l2)
# print("Final Weights")
# print(simpleNet.layer1)
# print(simpleNet.layer2)
# print("Post-test")
# print(simpleNet.feedforward(-3))
# print(simpleNet.feedforward(10))

def test_network(iterations,net,training_points):
    """
    Casually evaluates pre and post test
    :param iterations: number of trials to be run
    :param net: name of network to evaluate.
    ;param training_points: size of training data to be used
    :return: four 1x1 arrays.
    """
    pretest_negative = 0
    pretest_positive = 0
    posttest_negative = 0
    posttest_positive = 0
    for each in range(iterations):
        pretest_negative += net.feedforward(-10)
        pretest_positive += net.feedforward(10)
    net.train(build_simple_data(training_points),.1)
    for each in range(iterations):
        posttest_negative += net.feedforward(-10)
        posttest_positive += net.feedforward(10)
    return(pretest_negative/iterations, pretest_positive/iterations, posttest_negative/iterations, posttest_positive/iterations)

print(test_network(10000, simpleNet, 10000))

虽然此代码与 OP 中 posted 的代码有很大不同,但有一个特别有趣的区别。在原来的前馈方法通知

 #second layer's output activations use layer1's activations as input:
    for neuron in range(self.outputLayerSize):
        for weight in range(self.hiddenLayerSize):
            self.layer2_activations[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
        self.layer2_activations[neuron] = 1/(1+np.exp(self.layer2_activations[neuron]))

self.layer2_activations[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]

类似于

self.layer2_inputsums[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]

在更新的代码中。此行执行每个权重向量和每个输入向量(来自第 1 层的激活)之间的点积,以得出神经元的 input_sum,通常称为 z(想想 sigmoid(z))。在我的网络中,sigmoid 函数的导数 sigmoid_prime 用于计算成本函数相对于所有权重的梯度。通过乘以 sigmoid_prime(z) * 实际输出和预期输出之间的网络误差。如果 z 非常大(且为正),神经元的激活值将非常接近 1。这意味着网络确信该神经元应该被激活。如果 z 非常负,情况也是如此。然后,网络不想从根本上调整它满意的权重,因此神经元每个权重的变化比例由 sigmoid(z) 的梯度给出,sigmoid_prime(z) .非常大的 z 意味着非常小的梯度和应用于权重的非常小的变化(当网络对神经元应该如何分类以及该神经元的激活为 0.5 时不确定时,sigmoid 的梯度在 z = 0 处最大化)。

由于我不断地添加到每个神经元的 input_sum (z) 并且从不重置 dot(weights, activations) 的新输入值,z 的值不断增长,不断减慢改变重量,直到重量调整停止。我添加了以下行来解决这个问题:

self.layer2_inputsums = np.zeros((self.outputLayerSize, 1))

只要安装了 numpy 模块,就可以将新的 posted 网络复制并粘贴到编辑器中并执行。要打印的最后一行输出将是代表最终网络输出的 4 个数组的列表。前两个分别是负输入和正输入的预测试值。这些应该是随机的。后两个是 post-test 值,用于确定网络分类为正数和负数的程度。接近 0 的数字表示负数,接近 1 的数字表示正数。