神经网络中关于批量大小和时期的学习过程的流程是什么？

Question

我对神经网络中发生的事情感到困惑，这些术语涉及 批量大小、时期 和过程中的权重分布。

我想根据以下顺序验证我对流程的理解是否有效？

Considering one training/data point has 8 features(8 input nodes).
I have 20 training/data points.
I choose batch size of 2.
Now I want to make the model learn.

正在执行第一个纪元

Executing first batch

    Data point-1 :8 feature's values go through the 8 input nodes.
        Random weights are initialised
        Forward Propagation happens
        Backward Propagation happens
        The result of backward propagation-all the weights are updated.

    Data point-2 :8 feature's values go through the 8 input nodes.
        Forward propagation happens with the updated weights found from the previous(aka Data point-1) back propagation result.
        Backward propagation happens and all the weights are again updated.

Executing second batch

    Data point-3 :8 features go through the 8 input nodes.
        Forward propagation happens with the updated nodes found from the previous(aka  Data point-2) back propagation result
        Backward propagation happens and all the weights are again updated.

This process continues………….until the first epoch ends

正在执行第二个纪元

Executing the second batch
    Data point-1: 8 feature's value go through 8 input nodes.
        No random weights this time. Forward propagation happens with the last back-propagated values found(from the First Epoch, last executed batch)
        Backward propagation happens and all the weights are again updated.

This process continues.. until the second epoch ends.

该过程一直持续到所需的纪元。

Answer 1

您提到的步骤是针对随机梯度下降的，其中 batchsize 没有任何作用。因为权重在每个数据点之后更新并用于下一个数据点评估。

对于像批量大小=2 这样的最小批量场景，它应该一起计算新的权重（通过反向传播），然后在下一批（大小为 2）中使用它们，并继续直到所有批量达到顶点。你说的其他都是对的。

Answer 2

你几乎做对了所有事情，但反向传播权重更新。为小批量中的每个样本计算误差，但只有在小批量中的所有样本都经过前向传播后才会更新权重。您可以阅读更多相关信息 here

Answer 3

mini-batch处理是错误的：对于一个batch，我们一次计算整个batch的梯度，然后我们对所有梯度求和，然后每batch更新一次权重。

下面是说明梯度计算 d(loss)/d(W) 的简单示例的代码：y = W * x 对于 mini-batch 和 single 输入：

X = tf.placeholder(tf.float32, [None, 1])
Y = tf.placeholder(tf.float32, [None, 1])

W1 = tf.constant([[0.2]], dtype=tf.float32)
out = tf.matmul(X, W1)

loss = tf.square(out-Y)
#Calculate error gradient with respect to weights.
gradients = tf.gradients(loss, W1)[0]

with tf.Session() as sess:
sess.run(tf.global_variables_initializer())

#Giving individual inputs
print(sess.run([gradients], {X: np.array([[0.1]]), Y:[[0.05]]})) 
# [[-0.006]]
print(sess.run([gradients], {X: np.array([[0.2]]), Y:[[0.1]]}))  
#[[-0.024]]

#Give a batch combining the above inputs
print(sess.run([gradients], {X: np.array([[0.1],[0.2]]), Y:[[0.05], [0.1]]}))   
# [[-0.03]] which is the sum of the above gradients.

神经网络中关于批量大小和时期的学习过程的流程是什么？

What is the flow of the learning process within a neural network with respect to batch size and epochs?

machine-learning

theano

tensorflow