需要澄清有关 SGD 优化器的信息

Question

我有一个关于 SGD Optimizer 的问题。

有3种Gradient Descent Algorithm:

批量梯度下降
小批量梯度下降和
随机梯度下降

Stochastic Gradient Descent 是一个 Algorithm，其中 Training Set 中的一个 Instance 是在 Random 处获取的，并且Weights 相对于 Instance.

进行了更新

SGD Optimizer 稍微偏离了上面的定义，它可以接受超过的 batch_size 1. 谁能澄清一下这个偏差？

下面的代码似乎符合Stochastic Gradient Descent的定义：

model.compile(optimizer = 'sgd', loss = 'mse')
model.fit(x, y,epochs = 500, batch_size = 1,verbose=1)

但是，下面的代码似乎是 confusing/deviating（因为 batch_size > 1）：

model.compile(optimizer = 'sgd', loss = 'mse')
model.fit(x, y,epochs = 500, batch_size = 32, verbose=1)

提前感谢您的澄清。

Answer 1

引自维基百科：

It can be regarded as a stochastic approximation of gradient descent optimization, since it replaces the actual gradient (calculated from the entire data set) by an estimate thereof (calculated from a randomly selected subset of the data)

所以你说的三种都是SGD。即使您使用所有数据执行 SGD 迭代，它仍然是 actual 梯度的随机估计；在收集新数据时（您的数据集不包括宇宙中的所有数据）您的估计将会改变，因此是随机的。

需要澄清有关 SGD 优化器的信息

Need clarification regarding SGD Optimizer

keras

tensorflow

tf.keras