动量反向传播
Backpropagation with Momentum
我正在关注 this tutorial 以实现反向传播算法。但是,我一直坚持为该算法实施动量。
没有Momentum,这是权重更新方法的代码:
def update_weights(network, row, l_rate):
for i in range(len(network)):
inputs = row[:-1]
if i != 0:
inputs = [neuron['output'] for neuron in network[i - 1]]
for neuron in network[i]:
for j in range(len(inputs)):
neuron['weights'][j] += l_rate * neuron['delta'] * inputs[j]
neuron['weights'][-1] += l_rate * neuron['delta']
下面是我的实现:
def updateWeights(network, row, l_rate, momentum=0.5):
for i in range(len(network)):
inputs = row[:-1]
if i != 0:
inputs = [neuron['output'] for neuron in network[i-1]]
for neuron in network[i]:
for j in range(len(inputs)):
previous_weight = neuron['weights'][j]
neuron['weights'][j] += l_rate * neuron['delta'] * inputs[j] + momentum * previous_weight
previous_weight = neuron['weights'][-1]
neuron['weights'][-1] += l_rate * neuron['delta'] + momentum * previous_weight
这给了我一个 Mathoverflow 错误,因为权重在多个时期内呈指数级变得太大。我相信我的 previous_weight
更新逻辑是错误的。
我给你个提示。您在实现中将 momentum
乘以 previous_weight
,这是同一步骤中网络的另一个参数。这显然很快就爆炸了。
你应该做的是记住整个更新向量,
l_rate * neuron['delta'] * inputs[j]
,在 之前的反向传播步骤 上加起来。它可能看起来像这样:
velocity[j] = l_rate * neuron['delta'] * inputs[j] + momentum * velocity[j]
neuron['weights'][j] += velocity[j]
... 其中 velocity
是与 network
长度相同的数组,定义的范围比 updateWeights
大,并用零初始化。有关详细信息,请参阅 this post。
我正在关注 this tutorial 以实现反向传播算法。但是,我一直坚持为该算法实施动量。
没有Momentum,这是权重更新方法的代码:
def update_weights(network, row, l_rate):
for i in range(len(network)):
inputs = row[:-1]
if i != 0:
inputs = [neuron['output'] for neuron in network[i - 1]]
for neuron in network[i]:
for j in range(len(inputs)):
neuron['weights'][j] += l_rate * neuron['delta'] * inputs[j]
neuron['weights'][-1] += l_rate * neuron['delta']
下面是我的实现:
def updateWeights(network, row, l_rate, momentum=0.5):
for i in range(len(network)):
inputs = row[:-1]
if i != 0:
inputs = [neuron['output'] for neuron in network[i-1]]
for neuron in network[i]:
for j in range(len(inputs)):
previous_weight = neuron['weights'][j]
neuron['weights'][j] += l_rate * neuron['delta'] * inputs[j] + momentum * previous_weight
previous_weight = neuron['weights'][-1]
neuron['weights'][-1] += l_rate * neuron['delta'] + momentum * previous_weight
这给了我一个 Mathoverflow 错误,因为权重在多个时期内呈指数级变得太大。我相信我的 previous_weight
更新逻辑是错误的。
我给你个提示。您在实现中将 momentum
乘以 previous_weight
,这是同一步骤中网络的另一个参数。这显然很快就爆炸了。
你应该做的是记住整个更新向量,
l_rate * neuron['delta'] * inputs[j]
,在 之前的反向传播步骤 上加起来。它可能看起来像这样:
velocity[j] = l_rate * neuron['delta'] * inputs[j] + momentum * velocity[j]
neuron['weights'][j] += velocity[j]
... 其中 velocity
是与 network
长度相同的数组,定义的范围比 updateWeights
大,并用零初始化。有关详细信息,请参阅 this post。