为什么学习率不变?
Why learning rate does not change?
我使用 Tensorflow 对象检测 API 教程 https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/index.html 来训练我的自定义模型。按照此说明进行操作,我使用官方 GitHub 存储库中的配置文件和脚本 train.py 进行培训。我在配置文件中看到,学习率应该是自适应的。可以在这行中看到:
train_config: {
batch_size: 24
optimizer {
rms_prop_optimizer: {
learning_rate: {
exponential_decay_learning_rate {
initial_learning_rate: 0.004
decay_steps: 800720
decay_factor: 0.95
}
}
momentum_optimizer_value: 0.9
decay: 0.9
epsilon: 1.0
}
}
然后,我在训练期间使用了 TensorBoard,它告诉我,每个训练步骤的学习率都是恒定的。为什么会这样?可能 TensorBoard 只看到学习率的初始值,而优化器会即时计算它的实际值?
我从文档中了解到计算衰减率的公式是:
decayed_learning_rate = learning_rate *
decay_rate ^ (global_step / decay_steps)
在此global_step需要按以下方式给出:
[...] requires a global_step value to compute the decayed learning rate.
You can just pass a TensorFlow variable that you increment at each training step.
所以也许您只需要传递那个 global_step 参数就可以使速率有效衰减?
在API中,优化器是在这个file. And this中内置的,是针对rms_prop_optimizer
的行。为了构建优化器学习率,该函数调用了一个函数 _create_learning_rate
,该函数最终调用了 object_detection/utils
下的 learning_schedules
。以下是您的示例中学习率的安排方式。
def exponential_decay_with_burnin(global_step,
learning_rate_base,
learning_rate_decay_steps,
learning_rate_decay_factor,
burnin_learning_rate=0.0,
burnin_steps=0,
min_learning_rate=0.0,
staircase=True):
"""Exponential decay schedule with burn-in period.
In this schedule, learning rate is fixed at burnin_learning_rate
for a fixed period, before transitioning to a regular exponential
decay schedule.
Args:
global_step: int tensor representing global step.
learning_rate_base: base learning rate.
learning_rate_decay_steps: steps to take between decaying the learning rate.
Note that this includes the number of burn-in steps.
learning_rate_decay_factor: multiplicative factor by which to decay
learning rate.
burnin_learning_rate: initial learning rate during burn-in period. If
0.0 (which is the default), then the burn-in learning rate is simply
set to learning_rate_base.
burnin_steps: number of steps to use burnin learning rate.
min_learning_rate: the minimum learning rate.
staircase: whether use staircase decay.
Returns:
a (scalar) float tensor representing learning rate
"""
if burnin_learning_rate == 0:
burnin_learning_rate = learning_rate_base
post_burnin_learning_rate = tf.train.exponential_decay(
learning_rate_base,
global_step - burnin_steps,
learning_rate_decay_steps,
learning_rate_decay_factor,
staircase=staircase)
return tf.maximum(tf.where(
tf.less(tf.cast(global_step, tf.int32), tf.constant(burnin_steps)),
tf.constant(burnin_learning_rate),
post_burnin_learning_rate), min_learning_rate, name='learning_rate')
这是学习率衰减图。即使在 100 000 步之后,衰减实际上很小。
我使用 Tensorflow 对象检测 API 教程 https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/index.html 来训练我的自定义模型。按照此说明进行操作,我使用官方 GitHub 存储库中的配置文件和脚本 train.py 进行培训。我在配置文件中看到,学习率应该是自适应的。可以在这行中看到:
train_config: {
batch_size: 24
optimizer {
rms_prop_optimizer: {
learning_rate: {
exponential_decay_learning_rate {
initial_learning_rate: 0.004
decay_steps: 800720
decay_factor: 0.95
}
}
momentum_optimizer_value: 0.9
decay: 0.9
epsilon: 1.0
}
}
然后,我在训练期间使用了 TensorBoard,它告诉我,每个训练步骤的学习率都是恒定的。为什么会这样?可能 TensorBoard 只看到学习率的初始值,而优化器会即时计算它的实际值?
我从文档中了解到计算衰减率的公式是:
decayed_learning_rate = learning_rate *
decay_rate ^ (global_step / decay_steps)
在此global_step需要按以下方式给出:
[...] requires a global_step value to compute the decayed learning rate.
You can just pass a TensorFlow variable that you increment at each training step.
所以也许您只需要传递那个 global_step 参数就可以使速率有效衰减?
在API中,优化器是在这个file. And this中内置的,是针对rms_prop_optimizer
的行。为了构建优化器学习率,该函数调用了一个函数 _create_learning_rate
,该函数最终调用了 object_detection/utils
下的 learning_schedules
。以下是您的示例中学习率的安排方式。
def exponential_decay_with_burnin(global_step,
learning_rate_base,
learning_rate_decay_steps,
learning_rate_decay_factor,
burnin_learning_rate=0.0,
burnin_steps=0,
min_learning_rate=0.0,
staircase=True):
"""Exponential decay schedule with burn-in period.
In this schedule, learning rate is fixed at burnin_learning_rate
for a fixed period, before transitioning to a regular exponential
decay schedule.
Args:
global_step: int tensor representing global step.
learning_rate_base: base learning rate.
learning_rate_decay_steps: steps to take between decaying the learning rate.
Note that this includes the number of burn-in steps.
learning_rate_decay_factor: multiplicative factor by which to decay
learning rate.
burnin_learning_rate: initial learning rate during burn-in period. If
0.0 (which is the default), then the burn-in learning rate is simply
set to learning_rate_base.
burnin_steps: number of steps to use burnin learning rate.
min_learning_rate: the minimum learning rate.
staircase: whether use staircase decay.
Returns:
a (scalar) float tensor representing learning rate
"""
if burnin_learning_rate == 0:
burnin_learning_rate = learning_rate_base
post_burnin_learning_rate = tf.train.exponential_decay(
learning_rate_base,
global_step - burnin_steps,
learning_rate_decay_steps,
learning_rate_decay_factor,
staircase=staircase)
return tf.maximum(tf.where(
tf.less(tf.cast(global_step, tf.int32), tf.constant(burnin_steps)),
tf.constant(burnin_learning_rate),
post_burnin_learning_rate), min_learning_rate, name='learning_rate')
这是学习率衰减图。即使在 100 000 步之后,衰减实际上很小。