如何在 Chainer 的不同层实现单独的学习率或优化器?
How to implement separate learning rate or optimizer in different layer in Chainer?
在我的神经网络结构中,我想使用不同的学习率或优化器,例如AdaGrad,在每一层。如何实施?等待你的帮助。谢谢
将 optimizer
设置为 model
后,模型中 link
的每个参数都具有 update_rule
属性(例如本例中的 AdaGradRule
) , 它定义了如何更新此参数。
并且每个update_rule
都有单独的hyperparam
属性,所以你可以为link中的每个参数覆盖这些hyperparam
。
下面是示例代码,
class MLP(chainer.Chain):
def __init__(self, n_units, n_out):
super(MLP, self).__init__()
with self.init_scope():
# input size of each layer will be inferred when omitted
self.l1 = L.Linear(n_units) # n_in -> n_units
self.l2 = L.Linear(n_units) # n_units -> n_units
self.l3 = L.Linear(n_out) # n_units -> n_out
def __call__(self, x):
h1 = F.relu(self.l1(x))
h2 = F.relu(self.l2(h1))
return self.l3(h2)
model = MLP(args.unit, 10)
classifier_model = L.Classifier(model)
if args.gpu >= 0:
chainer.cuda.get_device_from_id(args.gpu).use() # Make a specified GPU current
classifier_model.to_gpu() # Copy the model to the GPU
# Setup an optimizer
optimizer = chainer.optimizers.AdaGrad()
optimizer.setup(classifier_model)
# --- After `optimizer.setup()`, you can modify `hyperparam` of each parameter ---
# 1. Change `update_rule` for specific parameter
# `l1` is `Linear` link, which has parameter `W` and `b`
classifier_model.predictor.l1.W.update_rule.hyperparam.lr = 0.01
# 2. Change `update_rule` for all parameters (W & b) of one link
for param in classifier_model.predictor.l2.params():
param.update_rule.hyperparam.lr = 0.01
# --- You can setup trainer module to train the model in the following...
...
在我的神经网络结构中,我想使用不同的学习率或优化器,例如AdaGrad,在每一层。如何实施?等待你的帮助。谢谢
将 optimizer
设置为 model
后,模型中 link
的每个参数都具有 update_rule
属性(例如本例中的 AdaGradRule
) , 它定义了如何更新此参数。
并且每个update_rule
都有单独的hyperparam
属性,所以你可以为link中的每个参数覆盖这些hyperparam
。
下面是示例代码,
class MLP(chainer.Chain):
def __init__(self, n_units, n_out):
super(MLP, self).__init__()
with self.init_scope():
# input size of each layer will be inferred when omitted
self.l1 = L.Linear(n_units) # n_in -> n_units
self.l2 = L.Linear(n_units) # n_units -> n_units
self.l3 = L.Linear(n_out) # n_units -> n_out
def __call__(self, x):
h1 = F.relu(self.l1(x))
h2 = F.relu(self.l2(h1))
return self.l3(h2)
model = MLP(args.unit, 10)
classifier_model = L.Classifier(model)
if args.gpu >= 0:
chainer.cuda.get_device_from_id(args.gpu).use() # Make a specified GPU current
classifier_model.to_gpu() # Copy the model to the GPU
# Setup an optimizer
optimizer = chainer.optimizers.AdaGrad()
optimizer.setup(classifier_model)
# --- After `optimizer.setup()`, you can modify `hyperparam` of each parameter ---
# 1. Change `update_rule` for specific parameter
# `l1` is `Linear` link, which has parameter `W` and `b`
classifier_model.predictor.l1.W.update_rule.hyperparam.lr = 0.01
# 2. Change `update_rule` for all parameters (W & b) of one link
for param in classifier_model.predictor.l2.params():
param.update_rule.hyperparam.lr = 0.01
# --- You can setup trainer module to train the model in the following...
...