神经网络中关于批量大小和时期的学习过程的流程是什么?

What is the flow of the learning process within a neural network with respect to batch size and epochs?

我对神经网络中发生的事情感到困惑,这些术语涉及 批量大小、时期 和过程中的权重分布。

我想根据以下顺序验证我对流程的理解是否有效?

Considering one training/data point has 8 features(8 input nodes).
I have 20 training/data points.
I choose batch size of 2.
Now I want to make the model learn.

正在执行第一个纪元

Executing first batch

    Data point-1 :8 feature's values go through the 8 input nodes.
        Random weights are initialised
        Forward Propagation happens
        Backward Propagation happens
        The result of backward propagation-all the weights are updated.

    Data point-2 :8 feature's values go through the 8 input nodes.
        Forward propagation happens with the updated weights found from the previous(aka Data point-1) back propagation result.
        Backward propagation happens and all the weights are again updated.

Executing second batch

    Data point-3 :8 features go through the 8 input nodes.
        Forward propagation happens with the updated nodes found from the previous(aka  Data point-2) back propagation result
        Backward propagation happens and all the weights are again updated.

This process continues………….until the first epoch ends

正在执行第二个纪元

Executing the second batch
    Data point-1: 8 feature's value go through 8 input nodes.
        No random weights this time. Forward propagation happens with the last back-propagated values found(from the First Epoch, last executed batch)
        Backward propagation happens and all the weights are again updated.

This process continues.. until the second epoch ends.

该过程一直持续到所需的纪元。

您提到的步骤是针对随机梯度下降的,其中 batchsize 没有任何作用。因为权重在每个数据点之后更新并用于下一个数据点评估。

对于像批量大小=2 这样的最小批量场景,它应该一起计算新的权重(通过反向传播),然后在下一批(大小为 2)中使用它们,并继续直到所有批量达到顶点。你说的其他都是对的。

你几乎做对了所有事情,但反向传播权重更新。为小批量中的每个样本计算误差,但只有在小批量中的所有样本都经过前向传播后才会更新权重。您可以阅读更多相关信息 here

mini-batch处理是错误的:对于一个batch,我们一次计算整个batch的梯度,然后我们对所有梯度求和,然后每batch更新一次权重。

下面是说明梯度计算 d(loss)/d(W) 的简单示例的代码:y = W * x 对于 mini-batchsingle 输入:

X = tf.placeholder(tf.float32, [None, 1])
Y = tf.placeholder(tf.float32, [None, 1])

W1 = tf.constant([[0.2]], dtype=tf.float32)
out = tf.matmul(X, W1)

loss = tf.square(out-Y)
#Calculate error gradient with respect to weights.
gradients = tf.gradients(loss, W1)[0]

with tf.Session() as sess:
sess.run(tf.global_variables_initializer())

#Giving individual inputs
print(sess.run([gradients], {X: np.array([[0.1]]), Y:[[0.05]]})) 
# [[-0.006]]
print(sess.run([gradients], {X: np.array([[0.2]]), Y:[[0.1]]}))  
#[[-0.024]]

#Give a batch combining the above inputs
print(sess.run([gradients], {X: np.array([[0.1],[0.2]]), Y:[[0.05], [0.1]]}))   
# [[-0.03]] which is the sum of the above gradients.