如何并行化这段代码?
How to parallelize this piece of code?
我已经浏览了一段时间,但找不到任何我能理解的建设性答案。
我应该如何并行化以下代码:
import random
import math
import numpy as np
import sys
import multiprocessing
boot = 20#number of iterations to be performed
def myscript(iteration_number):
#stuff that the code actually does
def main(unused_command_line_args):
for i in xrange(boot):
myscript(i)
return 0
if __name__ == '__main__':
sys.exit(main(sys.argv))
或者我在哪里可以读到它?我什至不确定如何搜索它。
对于一批令人尴尬的并行作业,从 for 循环到并行的过程几乎是自然的。
>>> import multiprocess as mp
>>> # build a target function
>>> def doit(x):
... return x**2 - 1
...
>>> x = range(10)
>>> # the for loop
>>> y = []
>>> for i in x:
... y.append(doit(i))
...
>>> y
[-1, 0, 3, 8, 15, 24, 35, 48, 63, 80]
那么如何并行处理这个函数呢?
>>> # convert the for loop to a map (still serial)
>>> y = map(doit, x)
>>> y
[-1, 0, 3, 8, 15, 24, 35, 48, 63, 80]
>>>
>>> # build a worker pool for parallel tasks
>>> p = mp.Pool()
>>> # do blocking parallel
>>> y = p.map(doit, x)
>>> y
[-1, 0, 3, 8, 15, 24, 35, 48, 63, 80]
>>>
>>> # use an iterator (non-blocking)
>>> y = p.imap(doit, x)
>>> y
<multiprocess.pool.IMapIterator object at 0x10358d150>
>>> print list(y)
[-1, 0, 3, 8, 15, 24, 35, 48, 63, 80]
>>> # do asynchronous parallel
>>> y = p.map_async(doit, x)
>>> y
<multiprocess.pool.MapResult object at 0x10358d1d0>
>>> print y.get()
[-1, 0, 3, 8, 15, 24, 35, 48, 63, 80]
>>>
>>> # or if you like for loops, there's always this…
>>> y = p.imap_unordered(doit, x)
>>> z = []
>>> for i in iter(y):
... z.append(i)
...
>>> z
[-1, 0, 3, 8, 15, 24, 35, 48, 63, 80]
最后一种形式是无序迭代器,它往往是最快的……但您不必关心结果返回的顺序——它们是无序的,并且 保证 到 return 的提交顺序相同。
另请注意,我使用了 multiprocess
(一个分支)而不是 multiprocessing
…,但这纯粹是因为 multiprocess
在处理交互式定义的函数时更好。否则上面的代码对于 multiprocessing
.
是相同的
我已经浏览了一段时间,但找不到任何我能理解的建设性答案。
我应该如何并行化以下代码:
import random
import math
import numpy as np
import sys
import multiprocessing
boot = 20#number of iterations to be performed
def myscript(iteration_number):
#stuff that the code actually does
def main(unused_command_line_args):
for i in xrange(boot):
myscript(i)
return 0
if __name__ == '__main__':
sys.exit(main(sys.argv))
或者我在哪里可以读到它?我什至不确定如何搜索它。
对于一批令人尴尬的并行作业,从 for 循环到并行的过程几乎是自然的。
>>> import multiprocess as mp
>>> # build a target function
>>> def doit(x):
... return x**2 - 1
...
>>> x = range(10)
>>> # the for loop
>>> y = []
>>> for i in x:
... y.append(doit(i))
...
>>> y
[-1, 0, 3, 8, 15, 24, 35, 48, 63, 80]
那么如何并行处理这个函数呢?
>>> # convert the for loop to a map (still serial)
>>> y = map(doit, x)
>>> y
[-1, 0, 3, 8, 15, 24, 35, 48, 63, 80]
>>>
>>> # build a worker pool for parallel tasks
>>> p = mp.Pool()
>>> # do blocking parallel
>>> y = p.map(doit, x)
>>> y
[-1, 0, 3, 8, 15, 24, 35, 48, 63, 80]
>>>
>>> # use an iterator (non-blocking)
>>> y = p.imap(doit, x)
>>> y
<multiprocess.pool.IMapIterator object at 0x10358d150>
>>> print list(y)
[-1, 0, 3, 8, 15, 24, 35, 48, 63, 80]
>>> # do asynchronous parallel
>>> y = p.map_async(doit, x)
>>> y
<multiprocess.pool.MapResult object at 0x10358d1d0>
>>> print y.get()
[-1, 0, 3, 8, 15, 24, 35, 48, 63, 80]
>>>
>>> # or if you like for loops, there's always this…
>>> y = p.imap_unordered(doit, x)
>>> z = []
>>> for i in iter(y):
... z.append(i)
...
>>> z
[-1, 0, 3, 8, 15, 24, 35, 48, 63, 80]
最后一种形式是无序迭代器,它往往是最快的……但您不必关心结果返回的顺序——它们是无序的,并且 保证 到 return 的提交顺序相同。
另请注意,我使用了 multiprocess
(一个分支)而不是 multiprocessing
…,但这纯粹是因为 multiprocess
在处理交互式定义的函数时更好。否则上面的代码对于 multiprocessing
.