numpy.random.normal() 在多线程环境中

Question

我试图用 numpy.random.normal() 函数并行化正态分布的随机数组的生成，但看起来来自线程的调用是顺序执行的。

import numpy as np

start = time.time()

active_threads = 0
for i in range(100000):
    t = threading.Thread(target=lambda x : np.random.normal(0,2,4000), args = [0])
    t.start()

    while active_threads >= 12:
        time.sleep(0.1)
        continue

end = time.time()
print(str(end-start))

如果我测量 1 个线程进程的时间，我得到与 12 个线程进程相同的结果。我知道这种并行化会带来很多开销，但即便如此，多线程版本仍然需要一些时间。

Answer 1

np.random.normal 在内部使用种子变量。这个变量是从 default_rng() 中检索的，它肯定是在线程之间共享的，所以使用多个线程调用它是不安全的（由于可能存在竞争条件）。事实上，文档提供了这种情况的示例（请参阅 here and there）。或者，您可以使用多个进程（您需要配置种子以在不同的进程中获得不同的结果）。另一种解决方案是使用自定义随机数生成器 (RNG)，以便在每个线程中使用不同的 RNG 对象。

numpy.random.normal() 在多线程环境中

numpy.random.normal() in multithreaded environment

python

random

parallel-processing

multithreading

numpy