权重校正是否也包括 Sigmoid 函数的导数？

Question

让我们评估一下此行在下面给出的代码块中的用法。 L1_delta = L1_error * nonlin(L1,True) # line 36

import numpy as np #line 1

# sigmoid function
def nonlin(x,deriv=False):
    if(deriv==True):
        return x*(1-x)
    return 1/(1+np.exp(-x))

# input dataset
X = np.array([  [0,0,1],
                [0,1,1],
                [1,0,1],
                [1,1,1] ])

# output dataset            
y = np.array([[0,0,1,1]]).T

# seed random numbers to make calculation
# deterministic (just a good practice)
np.random.seed(1)

# initialize weights randomly with mean 0
syn0 = 2*np.random.random((3,1)) - 1

for iter in range(1000):

    # forward propagation
    L0 = X
    L1 = nonlin(np.dot(L0,syn0))

    # how much did we miss?
    L1_error = y - L1

    # multiply how much we missed by the 
    # slope of the sigmoid at the values in L1
    L1_delta = L1_error * nonlin(L1,True) # line 36

    # update weights
    syn0 += np.dot(L0.T,L1_delta)

print ("Output After Training:")
print (L1)

我想知道，线路是必须的吗？为什么我们需要 Sigmoid 的导数因子？

我见过很多类似的逻辑回归例子，其中没有使用 Sigmoid 的导数。例如 https://github.com/chayankathuria/LogReg01/blob/master/GradientDescent.py

Answer 1

是的，确实需要这条线。您需要激活函数的导数（在本例中为 sigmoid），因为您的最终输出仅隐含地依赖于权重。这就是为什么你需要在 sigmoid 的导数出现的地方应用链式规则。

我建议你看一下这个post关于反向传播：https://datascience.stackexchange.com/questions/28719/a-good-reference-for-the-back-propagation-algorithm

它很好地解释了反向传播背后的数学原理。

权重校正是否也包括 Sigmoid 函数的导数？

Does correction to weights include derivative of Sigmoid function also?

neural-network

gradient-descent