为什么通过 torch.optim.SGD 方法学习率会发生变化？

Question

对于 SGD，学习率不应在 epoch 期间改变，但事实确实如此。请帮助我理解为什么会发生这种情况以及如何防止此 LR 更改？

import torch
params = [torch.nn.Parameter(torch.randn(1, 1))]
optimizer = torch.optim.SGD(params, lr=0.9)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 1, gamma=0.9)
for epoch in range(5):
    print(scheduler.get_lr())
    scheduler.step()

输出为：

[0.9]
[0.7290000000000001]
[0.6561000000000001]
[0.5904900000000002]
[0.5314410000000002]

我的手电筒版本是1.4.0

Answer 1

由于您使用的是命令 torch.optim.lr_scheduler.StepLR(optimizer, 1, gamma=0.9)（实际上意味着 torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.9)），因此您每 step_size=1 步将学习率乘以 gamma=0.9：

0.9 = 0.9
0.729 = 0.9*0.9*0.9
0.6561 = 0.9*0.9*0.9*0.9
0.59049 = 0.9*0.9*0.9*0.9*0.9

唯一的 "strange" 点是它在第二步缺少 0.81=0.9*0.9（更新：参见的解释）

为了防止过早减少，如果数据集中有 N 个样本，并且批量大小为 D，则设置 torch.optim.lr_scheduler.StepLR(optimizer, step_size=N/D, gamma=0.9) 在每个时期减少。减少每个E epoch集torch.optim.lr_scheduler.StepLR(optimizer, step_size=E*N/D, gamma=0.9)

Answer 2

这正是 torch.optim.lr_scheduler.StepLR 应该做的。它改变了学习率。来自 pytorch 文档：

Decays the learning rate of each parameter group by gamma every step_size epochs. Notice that such decay can happen simultaneously with other changes to the learning rate from outside this scheduler. When last_epoch=-1, sets initial lr as lr

如果您尝试优化 params，您的代码应该看起来更像这样（只是一个玩具示例，loss 的精确形式将取决于您的应用程序）

for epoch in range(5):
  optimizer.zero_grad()
  loss = (params[0]**2).sum()
  loss.backward()
  optimizer.step()

Answer 3

扩展 answer about "strange" behavior (0.81 is missing): It is PyTorch's default way since 1.1.0 release, check documentation，即这部分：

[...] If you use the learning rate scheduler (calling scheduler.step()) before the optimizer’s update (calling optimizer.step()), this will skip the first value of the learning rate schedule.

此外，您应该在第一次 get_lr() 调用后得到此函数抛出的 UserWarning，因为您根本没有调用 optimizer.step()。

为什么通过 torch.optim.SGD 方法学习率会发生变化？

Why does by torch.optim.SGD method learning rate change?

pytorch

learning-rate