Python: multiprocess workers,跟踪任务完成(缺失完成)
Python: multiprocess workers, tracking tasks completed (missing completions)
默认 multiprocessing.Pool
代码包含一个计数器,用于跟踪工作人员已完成的任务数:
completed += 1
logging.debug('worker exiting after %d tasks' % completed)
但是从 range(12)
上升到 range(20)
a pool.map
会导致计数器出错(这似乎与工人创建无关)。我也不太清楚是什么原因造成的。
例如:
import multiprocessing as mp
def ret_x(x):
return x
def inform():
print('made a worker!')
pool = mp.Pool(2, maxtasksperchild=2, initializer=inform)
res= pool.map(ret_x, range(8))
print(res)
将正常工作给予:
made a worker!
made a worker!
worker exiting after 2 tasks
worker exiting after 2 tasks
made a worker!
worker exiting after 2 tasks
made a worker!
worker exiting after 2 tasks
[0, 1, 2, 3, 4, 5, 6, 7]
但是将 range
更改为 20
不会显示正在创建任何额外的工作人员或总共 20 个已完成的任务,即使在预期列表中返回了完成的范围。
made a worker!
made a worker!
worker exiting after 2 tasks
worker exiting after 2 tasks
made a worker!
worker exiting after 2 tasks
made a worker!
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
worker exiting after 1 tasks
之所以如此,是因为您没有在 pool.map 中明确定义 "chunksize":
map(func, iterable[, chunksize])
This method chops the iterable into a number of chunks which it
submits to the process pool as separate tasks. The (approximate) size
of these chunks can be specified by setting chunksize to a positive
integer
来源:https://docs.python.org/2/library/multiprocessing.html#module-multiprocessing.pool
对于 8 个项目,考虑 len(pool)=2,chunksize 将为 1 ( divmod(8,2*4)) 所以你看到 (8/1)/2 workers = 4 workers
workers = (len of items / chunksize) / tasks per process
对于 20 个项目,考虑到 len(pool)=2,chunksize 将为 3 (divmode(20,2*4)) 所以你会看到类似 (20/3)/2 = 3.3 workers
对于 40...chunksize=5,workers= (40/5)/5 = 4 workers
如果需要,可以设置chunksize=1
res = pool.map(ret_x, range(40), 1)
你会看到 (20/1)/2 = 10 个工人
python mppp.py
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
所以块大小就像一个进程的单元工作量......或类似的东西。
如何计算块大小:https://hg.python.org/cpython/file/1c54def5947c/Lib/multiprocessing/pool.py#l305
默认 multiprocessing.Pool
代码包含一个计数器,用于跟踪工作人员已完成的任务数:
completed += 1
logging.debug('worker exiting after %d tasks' % completed)
但是从 range(12)
上升到 range(20)
a pool.map
会导致计数器出错(这似乎与工人创建无关)。我也不太清楚是什么原因造成的。
例如:
import multiprocessing as mp
def ret_x(x):
return x
def inform():
print('made a worker!')
pool = mp.Pool(2, maxtasksperchild=2, initializer=inform)
res= pool.map(ret_x, range(8))
print(res)
将正常工作给予:
made a worker!
made a worker!
worker exiting after 2 tasks
worker exiting after 2 tasks
made a worker!
worker exiting after 2 tasks
made a worker!
worker exiting after 2 tasks
[0, 1, 2, 3, 4, 5, 6, 7]
但是将 range
更改为 20
不会显示正在创建任何额外的工作人员或总共 20 个已完成的任务,即使在预期列表中返回了完成的范围。
made a worker!
made a worker!
worker exiting after 2 tasks
worker exiting after 2 tasks
made a worker!
worker exiting after 2 tasks
made a worker!
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
worker exiting after 1 tasks
之所以如此,是因为您没有在 pool.map 中明确定义 "chunksize":
map(func, iterable[, chunksize])
This method chops the iterable into a number of chunks which it submits to the process pool as separate tasks. The (approximate) size of these chunks can be specified by setting chunksize to a positive integer
来源:https://docs.python.org/2/library/multiprocessing.html#module-multiprocessing.pool
对于 8 个项目,考虑 len(pool)=2,chunksize 将为 1 ( divmod(8,2*4)) 所以你看到 (8/1)/2 workers = 4 workers
workers = (len of items / chunksize) / tasks per process
对于 20 个项目,考虑到 len(pool)=2,chunksize 将为 3 (divmode(20,2*4)) 所以你会看到类似 (20/3)/2 = 3.3 workers
对于 40...chunksize=5,workers= (40/5)/5 = 4 workers
如果需要,可以设置chunksize=1
res = pool.map(ret_x, range(40), 1)
你会看到 (20/1)/2 = 10 个工人
python mppp.py
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
所以块大小就像一个进程的单元工作量......或类似的东西。
如何计算块大小:https://hg.python.org/cpython/file/1c54def5947c/Lib/multiprocessing/pool.py#l305