Adam 优化器真的是 RMSprop 加上动量吗？如果是，为什么它没有动量参数？

Is Adam optimizer really RMSprop plus momentum? If yes, why it doesn't have a momentum parameter?

这是一个 link 到 tensorflow 优化器。在那里你可以看到，RMSprop 将动量作为参数，而 Adam 没有这样做。所以我很困惑。 Adam 优化伪装成具有动量的 RMSprop 优化，如下所示：

亚当 = RMSprop + 动量

但为什么 RMSprop 有动量参数而 Adam 没有呢？

虽然“Adam is RMSProp with momentum”这个表达确实被广泛使用，但这只是一个非常粗略的shorthand描述，不应该在面值;已经在原始 Adam paper 中明确说明（第 6 页）：

There are a few important differences between RMSProp with momentum and Adam: RMSProp with momentum generates its parameter updates using a momentum on the rescaled gradient, whereas Adam updates are directly estimated using a running average of first and second moment of the gradient.

有时，作者明确表示主语表达只是一种松散的描述，例如在（强烈推荐）Overview of gradient descent optimization algorithms（重点添加）：

Adam also keeps an exponentially decaying average of past gradients mt, similar to momentum.

或 Stanford CS231n: CNNs for Visual Recognition（再次强调）：

Adam is a recently proposed update that looks a bit like RMSProp with momentum.

也就是说，其他一些框架确实包含了 Adam 的 momentum 参数，但这实际上是 beta1 参数；这是 CNTK:

momentum (float, list, output of momentum_schedule()) – momentum schedule. Note that this is the beta1 parameter in the Adam paper. For additional information, please refer to the this CNTK Wiki article.

所以，不要太字面意思，也不要因此而失眠。

Adam 优化器真的是 RMSprop 加上动量吗？如果是，为什么它没有动量参数？

Is Adam optimizer really RMSprop plus momentum? If yes, why it doesn't have a momentum parameter?

optimization

adam

keras

tensorflow