将多个子进程的输出附加到 Python 中的数组

Question

我正在 Python 中编写一个脚本，它以这种方式打开多个子进程：

for file in os.listdir(FOLDER):
    subprocess.Popen(([myprocess]))

现在这个进程可以是 10-20 个运行并行，并且每个进程都会在控制台中输出一行字符串。我想做的是将这些输出（无论以何种顺序）附加到一个数组，当所有过程完成后，继续执行其他操作的脚本。

我不知道如何将每个输出附加到数组，我想检查是否所有子进程都已完成，我可以这样做：

outputs = []
k = len(os.listdir(FOLDER))
if len(outputs) == k 
 print "All processes are done!"

更新！这段代码现在似乎可以工作了：

pids=set()
outputs = []
for file in os.listdir(FOLDER):
    p = subprocess.Popen(([args]), stdout=subprocess.PIPE)
    pids.add(p.pid)
while pids:
    pid,retval=os.wait()
    output = p.stdout.read()
    outputs.append(output)
    print('{p} finished'.format(p=pid))
    pids.remove(pid)

print "Done!"
print outputs

问题是 outputs 看起来像这样

>> Done!
>> ['OUTPUT1', '', '', '', '', '', '', '', '', '']

只填了第一个值，其他都留空，为什么？

Answer 1

你可以等到他们都完成了他们的工作，然后汇总他们的标准输出。要了解它是如何完成的，请参阅 this answer，其中深入介绍了实施。

如果您需要异步执行，您应该为此作业生成一个新线程，并在该线程中进行等待。

如果您需要实时获得有关结果的通知，您可以分别为每个进程生成一个线程，在每个线程中等待它们，然后在它们完成后更新您的列表。

要读取进程的输出，您可以使用 subprocess.PIPE，如 in this answer。

编辑这是一个对我有用的完整示例：

#!/usr/bin/python2
import os
import random
import subprocess
outputs = []
processes = []
for i in range(4):
    args = ['bash', '-c', 'sleep ' + str(random.randint(0, 3)) + '; whoami']
    p = subprocess.Popen(args, stdout=subprocess.PIPE)
    processes.append(p)
while processes:
    p = processes[0]
    p.wait()
    output = p.stdout.read()
    outputs.append(output)
    print('{p} finished'.format(p=p.pid))
    os.sys.stdout.flush()
    processes.remove(p)
print outputs

Answer 2

What I want to do is to append these outputs (no matter in which order) to an array, and when all the processes are done, continue with the script doing other stuff.

#!/usr/bin/env python
import os
from subprocess import Popen, PIPE

# start processes (run in parallel)
processes = [Popen(['command', os.path.join(FOLDER, filename)], stdout=PIPE)
             for filename in os.listdir(FOLDER)]
# collect output
lines = [p.communicate()[0] for p in processes]

要限制并发进程数，可以使用线程池：

#!/usr/bin/env python
import os
from multiprocessing.dummy import Pool, Lock
from subprocess import Popen, PIPE

def run(filename, lock=Lock()):
    with lock: # avoid various multithreading bugs related to subprocess
        p = Popen(['command', os.path.join(FOLDER, filename)], stdout=PIPE)
    return p.communicate()[0]

# no more than 20 concurrent calls
lines = Pool(20).map(run, os.listdir(FOLDER))

后一个代码示例还可以同时读取多个子进程，而前一个代码示例实质上是在相应的标准输出 OS 管道缓冲区已满后序列化执行。

将多个子进程的输出附加到 Python 中的数组

Append output of multiple subprocesses to array in Python

python

arrays

subprocess