如何使用多处理获得数字列表中的最大数字

How to get the greatest number in a list of numbers using multiprocessing

我有一个随机数列表,我想使用 multiprocessing 获得最大的数字。

这是我用来生成列表的代码:

import random
randomlist = []
for i in range(100000000):
    n = random.randint(1,30000000)
    randomlist.append(n)

使用串行过程获得最大数:

import time

greatest = 0 # global variable

def f(n):
    global greatest
    if n>greatest:
        greatest = n

if __name__ == "__main__":
    global greatest

    t2 = time.time()
    greatest = 0

    for x in randomlist:
        f(x)    
    
    print("serial process took:", time.time()-t2)
    print("greatest = ", greatest)

这是我使用多处理获得最大数量的尝试:

from multiprocessing import Pool
import time

greatest = 0 # the global variable

def f(n):
    global greatest
    if n>greatest:
        greatest = n

if __name__ == "__main__":
    global greatest
    greatest = 0
    t1 = time.time()
    p = Pool() #(processes=3) 
    result = p.map(f,randomlist)
    p.close()
    p.join()
    print("pool took:", time.time()-t1)
    print("greatest = ", greatest)

这里输出的是0,很明显没有全局变量。如何在不影响性能的情况下解决此问题?

按照@Barmar 的建议,将 randomlist 分成块,然后处理每个块的局部最大值,最后根据 local_maximum_list:

计算全局最大值
import multiprocessing as mp
import numpy as np
import random
import time

CHUNKSIZE = 10000

def local_maximum(l):
    m = max(l)
    print(f"Local maximum: {m}")
    return m

if __name__ == '__main__':
    randomlist = np.random.randint(1, 30000000, 100000000)

    start = time.time()
    chunks = (randomlist[i:i+CHUNKSIZE]
                  for i in range(0, len(randomlist), CHUNKSIZE))

    with mp.Pool(mp.cpu_count()) as pool:
        local_maximum_list = pool.map(local_maximum, chunks)
    print(f"Global maximum: {max(local_maximum_list)}")
    end = time.time()
    print(f"MP Elapsed time: {end-start:.2f}s")

性能

随机列表的创建如何影响多处理的性能非常有趣

Scenario 1:
randomlist = np.random.randint(1, 30000000, 100000000)
MP Elapsed time: 1.63s

Scenario 2:
randomlist = np.random.randint(1, 30000000, 100000000).tolist()
MP Elapsed time: 6.02s

Scenario 3
randomlist = [random.randint(1, 30000000) for _ in range(100000000)]
MP Elapsed time: 7.14s

Scenario 4:
randomlist = list(np.random.randint(1, 30000000, 100000000))
MP Elapsed time: 184.28s

Scenario 5:
randomlist = []
for _ in range(100000000):
    n = random.randint(1, 30000000)
    randomlist.append(n)
MP Elapsed time: 7.52s