下面的Theano方法中更新params的方式有没有错误?
Is there an error in the way the params are updated in the following Theano method?
我正在浏览基于动量学习的在线教程,并在 Theano 中遇到了这种方法
def gradient_updates_momentum(cost, params, learning_rate, momentum):
'''
Compute updates for gradient descent with momentum
:parameters:
- cost : theano.tensor.var.TensorVariable
Theano cost function to minimize
- params : list of theano.tensor.var.TensorVariable
Parameters to compute gradient against
- learning_rate : float
Gradient descent learning rate
- momentum : float
Momentum parameter, should be at least 0 (standard gradient descent) and less than 1
:returns:
updates : list
List of updates, one for each parameter
'''
# Make sure momentum is a sane value
assert momentum < 1 and momentum >= 0
# List of update steps for each parameter
updates = []
# Just gradient descent on cost
for param in params:
# For each parameter, we'll create a param_update shared variable.
# This variable will keep track of the parameter's update step across iterations.
# We initialize it to 0
param_update = theano.shared(param.get_value()*0., broadcastable=param.broadcastable)
# Each parameter is updated by taking a step in the direction of the gradient.
# However, we also "mix in" the previous step according to the given momentum value.
# Note that when updating param_update, we are using its old value and also the new gradient step.
updates.append((param, param - learning_rate*param_update))
# Note that we don't need to derive backpropagation to compute updates - just use T.grad!
updates.append((param_update, momentum*param_update + (1. - momentum)*T.grad(cost, param)))
return updates
下面两行的顺序不是应该反过来(互换)吗?
updates.append((param, param - learning_rate*param_update))
和
updates.append((param_update, momentum*param_update + (1. - momentum)*T.grad(cost, param)))
我的理解是在执行训练方法并计算成本后,才进行更新 运行,对吗?
这是否意味着我们应该使用当前成本,并且使用现有的 param_update 值(来自上一次迭代),我们应该计算较新的 param_update 并因此更新当前参数值?
为什么是相反的,为什么是正确的?
提供给 theano.function
的更新列表中的更新顺序将被忽略。始终使用共享变量的 old 值计算更新。
这段代码显示忽略了更新顺序:
import theano
import theano.tensor
p = 0.5
param = theano.shared(1.)
param_update = theano.shared(2.)
cost = 3 * param * param
update_a = (param, param - param_update)
update_b = (param_update, p * param_update + (1 - p) * theano.grad(cost, param))
updates1 = [update_a, update_b]
updates2 = [update_b, update_a]
f1 = theano.function([], outputs=[param, param_update], updates=updates1)
f2 = theano.function([], outputs=[param, param_update], updates=updates2)
print f1(), f1()
param.set_value(1)
param_update.set_value(2)
print f2(), f2()
如果从逻辑上讲,你想要
new_a = old_a + a_update
new_b = new_a + b_update
那么您需要像这样提供更新:
new_a = old_a + a_update
new_b = old_a + a_update + b_update
我正在浏览基于动量学习的在线教程,并在 Theano 中遇到了这种方法
def gradient_updates_momentum(cost, params, learning_rate, momentum):
'''
Compute updates for gradient descent with momentum
:parameters:
- cost : theano.tensor.var.TensorVariable
Theano cost function to minimize
- params : list of theano.tensor.var.TensorVariable
Parameters to compute gradient against
- learning_rate : float
Gradient descent learning rate
- momentum : float
Momentum parameter, should be at least 0 (standard gradient descent) and less than 1
:returns:
updates : list
List of updates, one for each parameter
'''
# Make sure momentum is a sane value
assert momentum < 1 and momentum >= 0
# List of update steps for each parameter
updates = []
# Just gradient descent on cost
for param in params:
# For each parameter, we'll create a param_update shared variable.
# This variable will keep track of the parameter's update step across iterations.
# We initialize it to 0
param_update = theano.shared(param.get_value()*0., broadcastable=param.broadcastable)
# Each parameter is updated by taking a step in the direction of the gradient.
# However, we also "mix in" the previous step according to the given momentum value.
# Note that when updating param_update, we are using its old value and also the new gradient step.
updates.append((param, param - learning_rate*param_update))
# Note that we don't need to derive backpropagation to compute updates - just use T.grad!
updates.append((param_update, momentum*param_update + (1. - momentum)*T.grad(cost, param)))
return updates
下面两行的顺序不是应该反过来(互换)吗?
updates.append((param, param - learning_rate*param_update))
和
updates.append((param_update, momentum*param_update + (1. - momentum)*T.grad(cost, param)))
我的理解是在执行训练方法并计算成本后,才进行更新 运行,对吗?
这是否意味着我们应该使用当前成本,并且使用现有的 param_update 值(来自上一次迭代),我们应该计算较新的 param_update 并因此更新当前参数值?
为什么是相反的,为什么是正确的?
提供给 theano.function
的更新列表中的更新顺序将被忽略。始终使用共享变量的 old 值计算更新。
这段代码显示忽略了更新顺序:
import theano
import theano.tensor
p = 0.5
param = theano.shared(1.)
param_update = theano.shared(2.)
cost = 3 * param * param
update_a = (param, param - param_update)
update_b = (param_update, p * param_update + (1 - p) * theano.grad(cost, param))
updates1 = [update_a, update_b]
updates2 = [update_b, update_a]
f1 = theano.function([], outputs=[param, param_update], updates=updates1)
f2 = theano.function([], outputs=[param, param_update], updates=updates2)
print f1(), f1()
param.set_value(1)
param_update.set_value(2)
print f2(), f2()
如果从逻辑上讲,你想要
new_a = old_a + a_update
new_b = new_a + b_update
那么您需要像这样提供更新:
new_a = old_a + a_update
new_b = old_a + a_update + b_update