
Starting and waiting for multiple jobs started in background

我想一次启动多个 10 个作业,然后等待它们完成,然后在后台并行启动另外 10 个作业,重复此操作,直到完成所有 100 个作业。

这是调用 shell 脚本的 python 代码

from subprocess import call

# other code here.

# This variable is basically # of jobs/batch.
windowsize = 10

# Here is how I call the shell command. I have 100 jobs in total that I want as 10 batches with 10 jobs/batch.
for i in range (0..100) :
   numjobs = i + windowsize

   # Start 10 jobs in parallel at a time 
   for j in range (i..numjobs) :
       call (["./myscript.sh", "/usr/share/file1.txt", ""/usr/share/file2.txt"],   shell=True)

   # Hoping that to wait until the 10 jobs that were recently started in background finish.

在我的 shell 脚本中我有这个


# I start the job in background. Each job takes few minutes to finish.

shell command   &

不幸的是,所有 100 个作业都已启动,而不是 10 个批次中有 10 个 jobs/batch。

没有(直接的)方法来等待孙进程。在 myscript.sh 脚本末尾添加 wait

要限制并发 运行 个子进程的数量,您可以使用线程池:

#!/usr/bin/env python
import logging
from multiprocessing.pool import ThreadPool
from subprocess import call

windowsize = 10
cmd = ["./myscript.sh", "/usr/share/file1.txt", "/usr/share/file2.txt"]

def run(i):
    return i, call(cmd)

logging.basicConfig(format="%(asctime)-15s %(message)s", datefmt="%F %T",
pool = ThreadPool(windowsize)
for i, rc in pool.imap_unordered(run, range(100)):
    logging.info('%s-th command returns %s', i, rc)

注意:shell=True 已删除。