我应该洗牌数据以使用反向传播训练神经网络吗?
Should I shuffle the data to train a neural network using backpropagation?
我想使用反向传播训练神经网络,我有这样的数据集:
我应该打乱输入数据吗?
是的,并且应该在每次迭代时对其进行洗牌,例如引用自 {1}:
As for any stochastic gradient descent method (including
the mini-batch case), it is important for efficiency of the estimator that each example or minibatch
be sampled approximately independently. Because
random access to memory (or even worse, to
disk) is expensive, a good approximation, called incremental
gradient (Bertsekas, 2010), is to visit the
examples (or mini-batches) in a fixed order corresponding
to their order in memory or disk (repeating
the examples in the same order on a second epoch, if
we are not in the pure online case where each example
is visited only once). In this context, it is safer if
the examples or mini-batches are first put in a random
order (to make sure this is the case, it could
be useful to first shuffle the examples). Faster convergence
has been observed if the order in which the
mini-batches are visited is changed for each epoch,
which can be reasonably efficient if the training set
holds in computer memory.
{1} 本吉奥、约书亚。 “Practical recommendations for gradient-based training of deep architectures.”神经网络:交易技巧。 Springer Berlin Heidelberg, 2012. 437-478.
我想使用反向传播训练神经网络,我有这样的数据集:
我应该打乱输入数据吗?
是的,并且应该在每次迭代时对其进行洗牌,例如引用自 {1}:
As for any stochastic gradient descent method (including the mini-batch case), it is important for efficiency of the estimator that each example or minibatch be sampled approximately independently. Because random access to memory (or even worse, to disk) is expensive, a good approximation, called incremental gradient (Bertsekas, 2010), is to visit the examples (or mini-batches) in a fixed order corresponding to their order in memory or disk (repeating the examples in the same order on a second epoch, if we are not in the pure online case where each example is visited only once). In this context, it is safer if the examples or mini-batches are first put in a random order (to make sure this is the case, it could be useful to first shuffle the examples). Faster convergence has been observed if the order in which the mini-batches are visited is changed for each epoch, which can be reasonably efficient if the training set holds in computer memory.
{1} 本吉奥、约书亚。 “Practical recommendations for gradient-based training of deep architectures.”神经网络:交易技巧。 Springer Berlin Heidelberg, 2012. 437-478.