在 MPI 中应用 Reduce 和 Broadcast

Applying Reduce and Broadcast in MPI

我正在设计一个 Python 程序来计算 n 的某个固定值在 k < n < k^2 之间的素数个数。我的原始实现通过按处理器数量拆分计算来计算素数的数量。

我的意图是应用 reducebroadcast 命令来并行化所有处理器之间的工作。我是 MPI 的新手,不知道该怎么做,所以我在代码中注释掉了这两行。我很好奇是否应该遵循特定的方向来完成 MPI 中的 reducebroadcast 命令。我的原码是

import numpy as np
import platform
import sys
from mpi4py import MPI

comm = MPI.COMM_WORLD

id = comm.Get_rank( )

p = comm.Get_size( )
p=1

# Find the primes between 2 and k. Initialize k.
k=10
# Define a list S_k of the primes between 2 and k
primes=[]
# Define a list to store numbers that aren't prime between 2 and k.
not_prime = []
# Define a list S_k2 of the primes between k and k**2
primes2=[]


# Count the number of primes from 2 to k
for i in range(2, k+1):
    if i not in not_prime:
        primes.append(i)
        for j in range(i*i, k+1, i):
            not_prime.append(j)


# Find the number of primes between k and k**2
b=(k**2-k)/p
for n in range(int(k+(p-1)*b),int(k+(p)*b)):
    counter = 0
    for i in range(len(primes)):
        if (n % primes[i]) == 0:
            break
        else:
            counter = counter + 1
    if (counter==len(S_k)):
        primes2.append(n)

# I'm not sure what to use as parameters for comm.Reduce and comm.bcast
# comm.reduce = (primes2, op = MPI.SUM, root = 0 )
# comm.bcast |= (primes2, op = MPI.SUM, root = 0 )

print ("Number of processors: ",p)
print (primes2)
print((int(k+(p-1)*b),int(k+(p)*b)))

针对 p=1 进行测试时,代码 returns

Number of processors:  1
[11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
(10, 100)

针对 p=2 进行测试时,代码 returns

Number of processors:  2
[59, 61, 67, 71, 73, 79, 83, 89, 97]
(55, 100)

针对 p=3 进行测试时,代码 returns

Number of processors:  3
[71, 73, 79, 83, 89, 97]
(70, 100)

并且随着 p 的增加,primes2 中素数的数量减少。

理想情况下,我正在研究如何并行化代码以便针对 p=3 进行测试 returns

Number of processors:  3
Processor 1 computed [11, 13, 17, 19, 23, 29, 31, 37]
Processor 2 computed [41, 43, 47, 53, 59, 61, 67]
Processor 3 computed [71, 73, 79, 83, 89, 97]

我认为这可以通过为 MPI 应用 reduce 和 broadcast 命令来完成。我不确定我应该如何调整

# comm.reduce = (primes2, op = MPI.SUM, root = 0 )
# comm.bcast |= (primes2, op = MPI.SUM, root = 0 )

以便各个处理器计算素数的子集。

#mpiexec -n 3 python n_loop.py

import numpy as np
import platform
import sys
from mpi4py import MPI

comm = MPI.COMM_WORLD

id = comm.Get_rank( )

p = comm.Get_size( )

# Find the primes between 2 and k. Initialize k.
k=20
# Define a list S_k of the primes between 2 and k
S_k=[]
# Define a list to store numbers that aren't prime between 2 and k.
not_prime = []
# Define a list S_k2 of the primes between k and k**2
S_k2=[]


# Count the number of primes from 2 to k
for i in range(2, k+1):
    if i not in not_prime:
        S_k.append(i)
        for j in range(i*i, k+1, i):
            not_prime.append(j)


# Find the number of primes between k and k**2 by
# pararllelizing the n-loop.
b=(k**2-k)/p
for n in range(int(k+(id)*b),int(k+(id+1)*b)):
    counter = 0
    for i in range(len(S_k)):
        if (n % S_k[i]) == 0:
            break
        else:
            counter = counter + 1
    if (counter==len(S_k)):
        S_k2.append(n)

# Compute the amount of primes in the two lists.
processor_num_primes = len(S_k2)
original_num_primes = len(S_k)

# Broadcast the amount of primes and calculate the unions
# of sets of integers.
countb = 0
totalb = 0
for i in range(0,p):
    countb = comm.bcast(S_k2,i)
    totalb = totalb + len(countb)
total = comm.reduce(totalb,MPI.BOR,0)


if (id == 0):
    print ("Total number of processors: ",p)
print ("Processor ",id, "is calculating:", S_k2)
print ("Processor ",id, "is calculating this many primes :",processor_num_primes)
print("Processor ",id,"has the following range:",(int(k+(id)*b),int(k+(id+1)*b)))
if (id == 0):
    print("The total number of primes found between",2,"and",k*k,"by the n-"
    "loop is:",total+original_num_primes)
print('')