多线程训练神经网络没有数据访问同步正常吗?

Is it normal that there is no data access synchronization when training the neural network by several threads?

我看了classic word2vec sources,如果我理解正确的话,在用多个线程训练神经网络时没有数据访问同步(矩阵syn0、syn1、syn1neg的同步)。这是训练的正常做法,还是错误?

也许违反直觉,这是正常的。这方面的开创性工作是 2011 年的 'Hogwild' 论文:

https://papers.nips.cc/paper/4390-hogwild-a-lock-free-approach-to-parallelizing-stochastic-gradient-descent

其摘要:

Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve state-of-the-art performance on a variety of machine learning tasks. Several researchers have recently proposed schemes to parallelize SGD, but all require performance-destroying memory locking and synchronization. This work aims to show using novel theoretical analysis, algorithms, and implementation that SGD can be implemented without any locking. We present an update scheme called Hogwild which allows processors access to shared memory with the possibility of overwriting each other's work. We show that when the associated optimization problem is sparse, meaning most gradient updates only modify small parts of the decision variable, then Hogwild achieves a nearly optimal rate of convergence. We demonstrate experimentally that Hogwild outperforms alternative schemes that use locking by an order of magnitude.

事实证明,同步访问比线程覆盖彼此的工作更能减慢 SGD 的速度......一些结果甚至似乎暗示在实践中,额外的 "interference" 可能是一个净收益优化进度。