感知器神经网络不会学习特定范围内的值

Perceptron Neural Network will not learn values in a specific range

我正在研究神经网络,并希望制作一个干净的 class 实现来处理任何大小的网络。目前,我正在调试我的学习函数来处理 2 层网络。

使用逻辑激活的当前状态:

相关代码如下:

import numpy as np

def logistic(x, deriv = False):
  '''
  If using the derivative, input must be result of logistic
  '''
  if deriv:
    return x*(1-x)
    
  return 1/(1+np.exp(-x))

def feed_forw(input, weights):
  '''
  ***Wrapper for input.dot(weights)
  Input should be a np.array the same length as number of input nodes
    - A row of input represents the vector of input nodes
    - Different Rows are different input cases
  Weights is a 2D np.array of weights for each input node to each output node
    - dimensions of weights will determine length of output vector
    - top row is weights going from first input to node to all output nodes
    - first col is weights going from all input nodes to first output node
  '''

  return input.dot(weights)

class ANN:
  '''
  Artificial Neural Network of Perceptron Design
  Member Attributes:
    Weights: tuple of np.array
    - # of elements define number of layers
    - shapes of each element define nodes of each connecting pair of connecting layers
    Bias: tuple of np.array
    - added to each node after the first layer on a per layer basis
    - must have same dimensions as output from each corresponding element in Weights
    Target: np.array
    - array representing desired output.
  '''
  
  def __init__(self, weights, bias = 0, target = None):
    self._weights = weights
    self._bias = bias
    self._target = target

  def __str__(self):
    data = ''
    for w,b in zip(self._weights, self._bias):
      data += f'Weight\n{w}\nbias\n{b}\n'

    return f'{data}Seeking\n{self._target}\n'

  def _forwardProp(self, v, activation):
    '''
    Helper function to Learn
    '''
    out = []
    out.append(v.copy())
    for w,b in zip(self._weights, self._bias):
      out.append(feed_forw(out[-1], w) + b)
      out.append(activation(out[-1]))
    return out

  def setTarget(self, target):
    self._target = target

  def learn(self, input, activation, epoch = 10, eta = 1, debug = False):
    '''
    ***Currently only functions with 2-Layer perceptrons***
    input: np.array
    - a matrix representing each of case of input vectors
    - rows are input vectors for a single case
    activation: function object
    - An activation function used to normalize output
    epoch: int
    - test cycles
    eta: int
    - learning parameter
    '''
    for e in range(epoch):
      layers = self._forwardProp(input, activation)
      #layers is a list for keeping track of changes between layers
      #Pattern follows:
      #[input, layer 0 - weighted sum, layer 1 - activation, layer 1(output) - 
      #   weighted sum, layer 2 - activation, layer 2 ... 
      #   weighted sum, output layer - activation, ouput layer]
      
      #Final element is always network output
      error = layers[-1] - self._target

      delta_out = error * activation(layers[-1], deriv = True)
      #derivError_out = delta_out * activation(layers[-3].T*self._weights[-1])
      #derivError_out = delta_out * layers[-3].T*self._weights[-1]
      #EDIT
      derivError_out = delta_out * layers[-3].T
      derivError_bias = delta_out * self._bias[-1].T
      self._weights += -eta*derivError_out
      self._bias += -eta*derivError_bias

      if debug:
        print(f'Epoch {e+1}:\nOutput:\n{layers[-1]}\nError is\n{error}\nDelta Out Node:\n{delta_out}')
        print(f'Weight Increment:\n{derivError_out}\nBias Increment:\n{derivError_bias}')
        print(f'State after training rotation:\n{self}')

      #i = 1
      #while i < len(layers) + 1:
        #This loop will count from the last element of layers, will go back by 2
        #...
        #i += 2

用于测试的代码及其输出:

w2 = np.array([[0.03],
               [-0.1]])
b2 = np.array([[0.7]])
nn1 = ANN((w2,), (b2,))
x = np.array([[1,1]])
t = np.array([[0.7]])
nn1.setTarget(t)
nn1.learn(x, logistic, 100, debug = True)
'''
Epoch 1:
Output:
[[0.65248946]]
Error is
[[-0.04751054]]
Delta Out Node:
[[-0.01077287]]
Weight Increment:
[[-0.00032319]
 [ 0.00107729]]
Bias Increment:
[[-0.00754101]]
State after training rotation:
Weight
[[ 0.03032319]
 [-0.10107729]]
bias
[[0.70754101]]
Seeking
[[0.7]]

Epoch 2:
Output:
[[0.65402678]]
Error is
[[-0.04597322]]
Delta Out Node:
[[-0.01040263]]
Weight Increment:
[[-0.00031544]
 [ 0.00105147]]
Bias Increment:
[[-0.00736028]]
State after training rotation:
Weight
[[ 0.03063863]
 [-0.10212876]]
bias
[[0.71490129]]
Seeking
[[0.7]]
...
Epoch 99:
Output:
[[0.69871509]]
Error is
[[-0.00128491]]
Delta Out Node:
[[-0.00027049]]
Weight Increment:
[[-1.08348447e-05]
 [ 3.61161491e-05]]
Bias Increment:
[[-0.00025281]]
State after training rotation:
Weight
[[ 0.04006734]
 [-0.13355782]]
bias
[[0.93490471]]
Seeking
[[0.7]]

Epoch 100:
Output:
[[0.69876299]]
Error is
[[-0.00123701]]
Delta Out Node:
[[-0.00026038]]
Weight Increment:
[[-1.04328444e-05]
 [ 3.47761479e-05]]
Bias Increment:
[[-0.00024343]]
State after training rotation:
Weight
[[ 0.04007778]
 [-0.13359259]]
bias
[[0.93514815]]
Seeking
[[0.7]]
'''
#This cell is rerun with
t = np.array([[0.4]])
'''
Epoch 1:
Output:
[[0.65248946]]
Error is
[[0.25248946]]
Delta Out Node:
[[0.05725122]]
Weight Increment:
[[ 0.00171754]
 [-0.00572512]]
Bias Increment:
[[0.04007585]]
State after training rotation:
Weight
[[ 0.02828246]
 [-0.09427488]]
bias
[[0.65992415]]
Seeking
[[0.4]]

Epoch 2:
Output:
[[0.64426676]]
Error is
[[0.24426676]]
Delta Out Node:
[[0.05598279]]
Weight Increment:
[[ 0.00158333]
 [-0.00527777]]
Bias Increment:
[[0.0369444]]
State after training rotation:
Weight
[[ 0.02669913]
 [-0.08899711]]
bias
[[0.62297975]]
Seeking
[[0.4]]
...
Epoch 99:
Output:
[[0.50544009]]
Error is
[[0.10544009]]
Delta Out Node:
[[0.0263569]]
Weight Increment:
[[ 2.73123106e-05]
 [-9.10410354e-05]]
Bias Increment:
[[0.00063729]]
State after training rotation:
Weight
[[ 0.00100894]
 [-0.00336312]]
bias
[[0.02354185]]
Seeking
[[0.4]]

Epoch 100:
Output:
[[0.50529672]]
Error is
[[0.10529672]]
Delta Out Node:
[[0.02632123]]
Weight Increment:
[[ 2.65564469e-05]
 [-8.85214898e-05]]
Bias Increment:
[[0.00061965]]
State after training rotation:
Weight
[[ 0.00098238]
 [-0.0032746 ]]
bias
[[0.0229222]]
Seeking
[[0.4]]
'''
#Cell is rerun again with
b2 = np.array([[-0.7]])
t = np.array([[0.4]])
'''
Epoch 1:
Output:
[[0.31647911]]
Error is
[[-0.08352089]]
Delta Out Node:
[[-0.01806725]]
Weight Increment:
[[-0.00054202]
 [ 0.00180672]]
Bias Increment:
[[0.01264707]]
State after training rotation:
Weight
[[ 0.03054202]
 [-0.10180672]]
bias
[[-0.71264707]]
Seeking
[[0.4]]

Epoch 2:
Output:
[[0.31347742]]
Error is
[[-0.08652258]]
Delta Out Node:
[[-0.01862047]]
Weight Increment:
[[-0.00056871]
 [ 0.00189569]]
Bias Increment:
[[0.01326982]]
State after training rotation:
Weight
[[ 0.03111072]
 [-0.10370241]]
bias
[[-0.72591689]]
Seeking
[[0.4]]
...
Epoch 99:
Output:
[[0.01206264]]
Error is
[[-0.38793736]]
Delta Out Node:
[[-0.0046231]]
Weight Increment:
[[-0.00079352]
 [ 0.00264508]]
Bias Increment:
[[0.01851554]]
State after training rotation:
Weight
[[ 0.17243664]
 [-0.57478879]]
bias
[[-4.02352151]]
Seeking
[[0.4]]

Epoch 100:
Output:
[[0.01182232]]
Error is
[[-0.38817768]]
Delta Out Node:
[[-0.0045349]]
Weight Increment:
[[-0.00078198]
 [ 0.00260661]]
Bias Increment:
[[0.01824629]]
State after training rotation:
Weight
[[ 0.17321862]
 [-0.5773954 ]]
bias
[[-4.04176779]]
Seeking
[[0.4]]
'''

我可以看到,当输出小于 0.5 时,出于某种原因,无论如何都会使输出降低。如果起始输出小于0.5,它只会学习一个小于起始输出的值。如果起始输出为 0.5 或更大,它只会学习一个也大于 0.5 的值。然而,我仍然无法找到解决该问题的方法(至少优雅地)。

这是两个有争议的案例,所以我可以暴力破解。但是,我不会知道我犯了什么错误。

我知道有多种方法可以实现这个网络,甚至还存在这个 blog 上看到的可笑的简单变体,我仍然无法理解它的数学原理。但是,在这件事上工作了几周之后,我只能假设这是一些我没能看到的小错误。

这一编辑向解决方案迈出了一步。

在 class 定义中更改了以下行。

#derivError_out = delta_out * activation(layers[-3].T*self._weights[-1])
#derivError_out = delta_out * layers[-3].T*self._weights[-1]
derivError_out = delta_out * layers[-3].T

变化

立即出现的一个问题是您的 S 形函数导数似乎不正确。 sigmoid(x) 的导数不等于 (x)*(1-x) 而是 sigmoid(x)(1-sigmoid(x))

您可以相应地更改您的实现

def logistic(x, deriv=False):
    """
    If using the derivative, input must be result of logistic
    """
    phi = 1 / (1 + np.exp(-x))

    if deriv:
        return phi * (1 - phi)

    return phi

我看到两件我非常怀疑的事情是否正在做你正在做的事情:

  1. 您正在优化的错误不是理想的成本函数。
    梯度下降最小化了您正在制定的 error 项。
    查看当前公式 error = prediction - target 正确运行的梯度优化器将实现的最佳结果是只输出最小的(如果激活允许它甚至是负的)预测可能。

    建议:使用一些 L 范数作为误差函数,例如L1-范数
    error = | prediction - target |
  1. 也许我只是没法正确阅读它没关系,但权重更新
    derivError_out = delta_out * layers[-3].T*self._weights[-1]
    看起来很可疑,一旦你构建了你想要反向传播的错误项 delta_out 对各个权重的贡献仅取决于它们各自的激活 layers[-3].

我建议如下:(推导如果你有兴趣https://imgur.com/gallery/noS4pe4

def learn(self, input, activation, epoch=10, eta=1, debug=False):
    """ .... optimizing L1-norm .... """
    for e in range(epoch):
        layers = self._forwardProp(input, activation)

        diff = layers[-1] - self._target
        error = np.abs(diff)

        delta_out = np.sign(diff) * activation(layers[-2], deriv=True)
        derivError_out = delta_out * layers[-3].T
        self._weights -= eta * derivError_out
        self._bias -= eta * delta_out

另外:你的学习率eta好像比较高;这肯定会发散并产生一个具有爆炸权重的模型,最终根据初始化产生 0 或 1。

调低它以获得更稳定的梯度下降。

例如

nn1.learn(x, logistic, 100, eta=1e-2, debug=True)

递增神经网络参数的正确推导可以在互联网上找到。在感知器的情况下,每个偏差都会增加一个“delta”值,该值由它所连接的输出节点决定。我的偏差节点不是简单地实现它,而是通过自身和这个“增量”的乘积来增加。

我无能把这个表达式写成

delta_out = error * activation(layers[-1], deriv = True)
derivError_bias = delta_out * self._bias[-1].T
self._bias -= eta * derivError_bias

而不是self._bias -= eta * delta_out

网络现在可以学习任何具有随机分配的权重和偏差的随机分配的目标。