Adam 优化器真的是 RMSprop 加上动量吗?如果是,为什么它没有动量参数?
Is Adam optimizer really RMSprop plus momentum? If yes, why it doesn't have a momentum parameter?
这是一个 link 到 tensorflow 优化器。在那里你可以看到,RMSprop 将动量作为参数,而 Adam 没有这样做。所以我很困惑。 Adam 优化伪装成具有动量的 RMSprop 优化,如下所示:
亚当 = RMSprop + 动量
但为什么 RMSprop 有动量参数而 Adam 没有呢?
虽然“Adam is RMSProp with momentum”这个表达确实被广泛使用,但这只是一个非常粗略的shorthand描述,不应该在面值;已经在原始 Adam paper 中明确说明(第 6 页):
There are a few important differences between RMSProp with momentum and Adam: RMSProp with momentum generates its parameter updates using a momentum on the rescaled gradient, whereas Adam updates are directly estimated using a running average of first and second moment of the gradient.
有时,作者明确表示主语表达只是一种松散的描述,例如在(强烈推荐)Overview of gradient descent optimization algorithms(重点添加):
Adam also keeps an exponentially decaying average of past gradients mt, similar to momentum.
或 Stanford CS231n: CNNs for Visual Recognition(再次强调):
Adam is a recently proposed update that looks a bit like RMSProp with momentum.
也就是说,其他一些框架确实包含了 Adam 的 momentum
参数,但这实际上是 beta1
参数;这是 CNTK:
momentum (float, list, output of momentum_schedule()
) – momentum schedule. Note that this is the beta1 parameter in the Adam paper. For additional information, please refer to the this CNTK Wiki article.
所以,不要太字面意思,也不要因此而失眠。
这是一个 link 到 tensorflow 优化器。在那里你可以看到,RMSprop 将动量作为参数,而 Adam 没有这样做。所以我很困惑。 Adam 优化伪装成具有动量的 RMSprop 优化,如下所示:
亚当 = RMSprop + 动量
但为什么 RMSprop 有动量参数而 Adam 没有呢?
虽然“Adam is RMSProp with momentum”这个表达确实被广泛使用,但这只是一个非常粗略的shorthand描述,不应该在面值;已经在原始 Adam paper 中明确说明(第 6 页):
There are a few important differences between RMSProp with momentum and Adam: RMSProp with momentum generates its parameter updates using a momentum on the rescaled gradient, whereas Adam updates are directly estimated using a running average of first and second moment of the gradient.
有时,作者明确表示主语表达只是一种松散的描述,例如在(强烈推荐)Overview of gradient descent optimization algorithms(重点添加):
Adam also keeps an exponentially decaying average of past gradients mt, similar to momentum.
或 Stanford CS231n: CNNs for Visual Recognition(再次强调):
Adam is a recently proposed update that looks a bit like RMSProp with momentum.
也就是说,其他一些框架确实包含了 Adam 的 momentum
参数,但这实际上是 beta1
参数;这是 CNTK:
momentum (float, list, output of
momentum_schedule()
) – momentum schedule. Note that this is the beta1 parameter in the Adam paper. For additional information, please refer to the this CNTK Wiki article.
所以,不要太字面意思,也不要因此而失眠。