weights/biases 每个小批量只更新一次吗?
Are weights/biases only updated once per mini-Batch?
我正在学习神经网络教程,我对更新权重的函数有疑问。
def update_mini_batch(self, mini_batch, eta):
"""Update the network's weights and biases by applying
gradient descent using backpropagation to a single mini batch.
The "mini_batch" is a list of tuples "(x, y)", and "eta"
is the learning rate."""
nabla_b = [np.zeros(b.shape) for b in self.biases] #Initialize bias matrix with 0's
nabla_w = [np.zeros(w.shape) for w in self.weights] #Initialize weights matrix with 0's
for x, y in mini_batch: #For tuples in one mini_batch
delta_nabla_b, delta_nabla_w = self.backprop(x, y) #Calculate partial derivatives of bias/weights with backpropagation, set them to delta_nabla_b
nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)] #Generate a list with partial derivatives of bias of every neuron
nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)] #Generate a list with partial derivatives of weights for every neuron
self.weights = [w-(eta/len(mini_batch))*nw #Update weights according to update rule
for w, nw in zip(self.weights, nabla_w)] #What author does is he zips 2 lists with values he needs (Current weights and partial derivatives), then do computations with them.
self.biases = [b-(eta/len(mini_batch))*nb #Update biases according to update rule
for b, nb in zip(self.biases, nabla_b)]
我在这里不明白的是使用了一个for循环来计算nabla_b和nabla_w(weights/biases的偏导数)。对小批量中的 每个 个训练示例进行反向传播,但只更新 weights/biases 一次 。
对我来说,假设我们有一个大小为 10 的小批量,我们计算 nabla_b 和 nabla_w 10 次,并且在 for 循环完成权重和偏差更新后.但是 for 循环不是每次都会重置 nabla_b 和 nabla_b 列表吗?为什么我们不更新 self.weights
和 self.biases
inside for 循环?
神经网络工作得很好,所以我想我在某个地方犯了一个小的思维错误。
仅供参考:可以找到我正在关注的教程的相关部分 here
不,更新发生在批次结束后,依次应用每个训练更新。规范描述说我们计算所有更新的平均值并按该平均值进行调整;反过来,通过每次更新进行调整在算术上是等效的。
首先,初始化偏差和权重数组。
nabla_b = [np.zeros(b.shape) for b in self.biases] #Initialize bias matrix with 0's
nabla_w = [np.zeros(w.shape) for w in self.weights] #Initialize weights matrix with 0's
对于迷你比赛中的每一次观察,
将训练结果插入偏差和权重数组
for x, y in mini_batch: #For tuples in one mini_batch
delta_nabla_b, delta_nabla_w = self.backprop(x, y) #Calculate partial derivatives of bias/weights with backpropagation, set them to delta_nabla_b
nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)] #Generate a list with partial derivatives of bias of every neuron
nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)] #Generate a list with partial derivatives of weights for every neuron
最后,调整每个权重和偏差,依次调整每个训练结果的值。
self.weights = [w-(eta/len(mini_batch))*nw #Update weights according to update rule
for w, nw in zip(self.weights, nabla_w)] #What author does is he zips 2 lists with values he needs (Current weights and partial derivatives), then do computations with them.
self.biases = [b-(eta/len(mini_batch))*nb #Update biases according to update rule
for b, nb in zip(self.biases, nabla_b)]
了解此循环如何增加每个训练示例的偏差和权重的关键是注意 evaluation order in Python。具体来说,=
符号右侧的所有内容在分配给 =
符号左侧的变量之前先求值。
这是一个更简单的示例,可能更容易理解:
nabla_b = [0, 0, 0, 0, 0]
for x in range(10):
delta_nabla_b = [-1, 2, -3, 4, -5]
nabla_b = [nb + dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
在这个例子中,我们只有五个标量偏差和每个偏差的常数梯度。在这个循环的最后,nabla_b
是什么?考虑使用 zip
的定义扩展的理解,并记住 =
符号右侧的所有内容在写入左侧的变量名称之前进行评估:
nabla_b = [0, 0, 0, 0, 0]
for x in range(10):
# nabla_b is defined outside of this loop
delta_nabla_b = [-1, 2, -3, 4, -5]
# expand the comprehension and the zip() function
temp = []
for i in range(len(nabla_b)):
temp.append(nabla_b[i] + delta_nabla_b[i])
# now that the RHS is calculated, set it to the LHS
nabla_b = temp
此时应该清楚 nabla_b
的每个元素都在推导式中与 delta_nabla_b
的每个相应元素相加,结果将覆盖 nabla_b
循环的下一次迭代。
因此在教程示例中,nabla_b
和 nabla_w
是添加了梯度的偏导数之和 minibatch 中每个训练示例一次 .它们在技术上 是 为每个训练示例重置,但它们被重置为之前的值加上梯度,这正是您想要的。一个更清晰(但不太简洁)的写法可能是:
def update_mini_batch(self, mini_batch, eta):
nabla_b = [np.zeros(b.shape) for b in self.biases]
nabla_w = [np.zeros(w.shape) for w in self.weights]
for x, y in mini_batch:
delta_nabla_b, delta_nabla_w = self.backprop(x, y)
# expanding the comprehensions
for i in range(len(nabla_b)):
nabla_b[i] += delta_nabla_b[i] # set the value of each element directly
for i in range(len(nabla_w)):
nabla_w[i] += delta_nabla_w[i]
self.weights = [w-(eta/len(mini_batch))*nw # note that this comprehension uses the same trick
for w, nw in zip(self.weights, nabla_w)]
self.biases = [b-(eta/len(mini_batch))*nb
for b, nb in zip(self.biases, nabla_b)]
我正在学习神经网络教程,我对更新权重的函数有疑问。
def update_mini_batch(self, mini_batch, eta):
"""Update the network's weights and biases by applying
gradient descent using backpropagation to a single mini batch.
The "mini_batch" is a list of tuples "(x, y)", and "eta"
is the learning rate."""
nabla_b = [np.zeros(b.shape) for b in self.biases] #Initialize bias matrix with 0's
nabla_w = [np.zeros(w.shape) for w in self.weights] #Initialize weights matrix with 0's
for x, y in mini_batch: #For tuples in one mini_batch
delta_nabla_b, delta_nabla_w = self.backprop(x, y) #Calculate partial derivatives of bias/weights with backpropagation, set them to delta_nabla_b
nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)] #Generate a list with partial derivatives of bias of every neuron
nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)] #Generate a list with partial derivatives of weights for every neuron
self.weights = [w-(eta/len(mini_batch))*nw #Update weights according to update rule
for w, nw in zip(self.weights, nabla_w)] #What author does is he zips 2 lists with values he needs (Current weights and partial derivatives), then do computations with them.
self.biases = [b-(eta/len(mini_batch))*nb #Update biases according to update rule
for b, nb in zip(self.biases, nabla_b)]
我在这里不明白的是使用了一个for循环来计算nabla_b和nabla_w(weights/biases的偏导数)。对小批量中的 每个 个训练示例进行反向传播,但只更新 weights/biases 一次 。
对我来说,假设我们有一个大小为 10 的小批量,我们计算 nabla_b 和 nabla_w 10 次,并且在 for 循环完成权重和偏差更新后.但是 for 循环不是每次都会重置 nabla_b 和 nabla_b 列表吗?为什么我们不更新 self.weights
和 self.biases
inside for 循环?
神经网络工作得很好,所以我想我在某个地方犯了一个小的思维错误。
仅供参考:可以找到我正在关注的教程的相关部分 here
不,更新发生在批次结束后,依次应用每个训练更新。规范描述说我们计算所有更新的平均值并按该平均值进行调整;反过来,通过每次更新进行调整在算术上是等效的。
首先,初始化偏差和权重数组。
nabla_b = [np.zeros(b.shape) for b in self.biases] #Initialize bias matrix with 0's
nabla_w = [np.zeros(w.shape) for w in self.weights] #Initialize weights matrix with 0's
对于迷你比赛中的每一次观察, 将训练结果插入偏差和权重数组
for x, y in mini_batch: #For tuples in one mini_batch
delta_nabla_b, delta_nabla_w = self.backprop(x, y) #Calculate partial derivatives of bias/weights with backpropagation, set them to delta_nabla_b
nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)] #Generate a list with partial derivatives of bias of every neuron
nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)] #Generate a list with partial derivatives of weights for every neuron
最后,调整每个权重和偏差,依次调整每个训练结果的值。
self.weights = [w-(eta/len(mini_batch))*nw #Update weights according to update rule
for w, nw in zip(self.weights, nabla_w)] #What author does is he zips 2 lists with values he needs (Current weights and partial derivatives), then do computations with them.
self.biases = [b-(eta/len(mini_batch))*nb #Update biases according to update rule
for b, nb in zip(self.biases, nabla_b)]
了解此循环如何增加每个训练示例的偏差和权重的关键是注意 evaluation order in Python。具体来说,=
符号右侧的所有内容在分配给 =
符号左侧的变量之前先求值。
这是一个更简单的示例,可能更容易理解:
nabla_b = [0, 0, 0, 0, 0]
for x in range(10):
delta_nabla_b = [-1, 2, -3, 4, -5]
nabla_b = [nb + dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
在这个例子中,我们只有五个标量偏差和每个偏差的常数梯度。在这个循环的最后,nabla_b
是什么?考虑使用 zip
的定义扩展的理解,并记住 =
符号右侧的所有内容在写入左侧的变量名称之前进行评估:
nabla_b = [0, 0, 0, 0, 0]
for x in range(10):
# nabla_b is defined outside of this loop
delta_nabla_b = [-1, 2, -3, 4, -5]
# expand the comprehension and the zip() function
temp = []
for i in range(len(nabla_b)):
temp.append(nabla_b[i] + delta_nabla_b[i])
# now that the RHS is calculated, set it to the LHS
nabla_b = temp
此时应该清楚 nabla_b
的每个元素都在推导式中与 delta_nabla_b
的每个相应元素相加,结果将覆盖 nabla_b
循环的下一次迭代。
因此在教程示例中,nabla_b
和 nabla_w
是添加了梯度的偏导数之和 minibatch 中每个训练示例一次 .它们在技术上 是 为每个训练示例重置,但它们被重置为之前的值加上梯度,这正是您想要的。一个更清晰(但不太简洁)的写法可能是:
def update_mini_batch(self, mini_batch, eta):
nabla_b = [np.zeros(b.shape) for b in self.biases]
nabla_w = [np.zeros(w.shape) for w in self.weights]
for x, y in mini_batch:
delta_nabla_b, delta_nabla_w = self.backprop(x, y)
# expanding the comprehensions
for i in range(len(nabla_b)):
nabla_b[i] += delta_nabla_b[i] # set the value of each element directly
for i in range(len(nabla_w)):
nabla_w[i] += delta_nabla_w[i]
self.weights = [w-(eta/len(mini_batch))*nw # note that this comprehension uses the same trick
for w, nw in zip(self.weights, nabla_w)]
self.biases = [b-(eta/len(mini_batch))*nb
for b, nb in zip(self.biases, nabla_b)]