为什么这个与门神经网络没有朝着最佳权重方向发展？

Question

我有两个输入和一个输出的简单神经网络，没有隐藏层。即 [input1][weight1 weight2] = z [input2]

输出 = sigmoid(z)

权重似乎没有移动到最佳值。据我所知，我已经检查了梯度，我可以看到权重根据成本函数的导数上升或下降，但网络并没有朝着最优值移动。

代码如下：

import numpy as np
import random as r
import sys

def sigmoid(ip, derivate=False):
    if derivate:
        return ip*(1-ip)
    return 1.0/(1+np.exp(-1*ip))

class NeuralNet:
    global sigmoid 

    def __init__(self):
        self.inputLayers = 2
        self.outputLayer = 1

    def setup(self):
        self.i = np.array([r.random(), r.random()], dtype=float).reshape(2,)
        self.w = np.array([r.random(), r.random()], dtype=float).reshape(2,)

    def forward_propogate(self):
        self.z = self.w*self.i
        self.o = sigmoid(sum(self.z))

    def optimize_cost(self, desired):
        i=0
        current_cost = pow(desired - self.o, 2)
        for weight in self.w:
            dpdw = -1 * (desired-self.o) * (sigmoid(self.o, derivate=True)) * self.i[i]
            print(dpdw)
            self.w[i] = self.w[i] + 500*dpdw
            i+=1
        self.forward_propogate()

    def train(self, ip, op):
        self.i = np.array(ip).reshape(2,)
        self.forward_propogate()
        print("before:{}".format(self.o))
        self.optimize_cost(op[0])
        # print(self.i,self.w)
n = NeuralNet()
n.setup()
# while sys.stdin.read(1):
while True:
    a = r.random()
    b = r.random()
    if a>0.5 and b>0.5:
        c = 0.9
    else:
        c = 0.1
    print(c)
    n.train([a,b],[c])
    print(n.i, n.w)
    print("after: {}".format(n.o))

Answer 1

我读过这个https://towardsdatascience.com/emulating-logical-gates-with-a-neural-network-75c229ec4cc9，还有说需要更深的（具有（多个）隐藏层）网络才能获得良好的训练结果，提到的原因是：

Training and Learning

Now we have shown that this neural network is possible, now the remaining question is, it is possible to train. Can we expect that if we simply fed in the data drawn from the graph above after defining the layers, number of neurons and activation functions correctly, the network will train in this way?

No, not always, and not even often. The problem, like with many neural networks is one of optimization. In training this network it will often get stuck in a local minimum even though a near-perfect solution exists. This is where your optimization algorithm may play a large role, and this is something which Tensorflow Playground doesn’t allow you to change and may be the subject of a future post.

[...]

After you have built this network by manually inputting the weights, why not try to train the weights of this this network from scratch instead of constructing it in manually. I have managed to do this after many trials, but I believe it is quite sensitive to the seeding and often ends up in local minimums. If you find a reliable way to train this network using these features and this network structure please reach out in the comments.

Try to build this network using the only this number of neurons and layers. In this article I have shown that it is possible to do it with this many neurons only. If you introduce any more nodes then you will certainly have some redundant neurons. Although, with more neurons/layers, I have had better luck in training a good model more consistently.

可能问题与神经网络的乘法问题有关。平面（或非深层/无隐藏层）神经网络无法执行简单的乘法 cf https://stats.stackexchange.com/questions/217703/can-deep-neural-network-approximate-multiplication-function-without-normalizatio

更新（评论）

老实说，我不确定 MSE 误差函数，因为它在分类问题上不好，cf https://towardsdatascience.com/why-using-mean-squared-error-mse-cost-function-for-binary-classification-is-a-bad-idea-933089e90df7 and https://medium.com/autonomous-agents/how-to-teach-logic-to-your-neuralnetworks-116215c71a49 (uses negative log likelihood error function that is also known as multiclass cross-entropy) and also https://machinelearningmastery.com/how-to-choose-loss-functions-when-training-deep-learning-neural-networks/ :

Mean Squared Error Loss

The Mean Squared Error, or MSE, loss is the default loss to use for regression [not classification] problems.

来源：https://machinelearningmastery.com/how-to-choose-loss-functions-when-training-deep-learning-neural-networks/

训练两个标签或类（True、False）是分类问题，不是回归问题。

不过我认为主要的系统问题是网络不够深。正如文章 https://towardsdatascience.com/emulating-logical-gates-with-a-neural-network-75c229ec4cc9 中所说，您可以对初始权重组合进行播种以避免局部最小值，但这也不能解决基本问题（网络不够深，误差函数 (MSE) 错误）。

在 https://towardsdatascience.com/lets-code-a-neural-network-in-plain-numpy-ae7e74410795 中是用于分类的神经网络的 numpy 实现，其中包括 二元交叉熵 误差函数的实现，可以将其与您的代码进行比较。

Answer 2

回答我自己的问题。我所需要的只是一个 BIAS。没有BIAS，sigmoid不能偏离0.

这是一个偏置为 2 的 sigmoid。现在 sigmoid(0) = 接近 0.1

在网络中包含一个 BIAS 节点后，我能够得到结果。

Success rate: 99.00000042272556% Network trained, took: 2365601 trials
Network weights:[14.0435016 14.04351048]
Bias: 21.861074330808844

Enter INPUT:
0
1
Network output:0.00040243926180320134

Enter INPUT:
1
1
Network output:0.9980264340845117

为什么这个与门神经网络没有朝着最佳权重方向发展？

Why is this AND gate neural network not moving towards optimal weights?

python

optimization

machine-learning

derivative

neural-network