Python: 如何使用多个参数对我的函数进行多处理？

Question

我的函数sound_proc.windowing() 从一个目录中将一些声音文件切割成片段，并将这些片段保存到另一个目录中。要剪切目录中的所有文件，我用 for 循环遍历所有文件：

# emodb_path_src = source folder with all sound files in it
# 512 = fix integer
# emodb_path_train = destination folder where all the cut files go

files = [l for l in listdir(emodb_path_src)]

for index, file in enumerate(files):
    print(f'File: {index+1}/{len(files)}')
    sound_proc.windowing(f'{emodb_path_src}{file}', 512, emodb_path_train)

不幸的是，这个过程非常慢，因为只使用了一个处理器核心。我已经用 multiprocessing 和 Pool 试过了，但我不能让它工作。如果有人能给我一些提示以在多核上获得它，那就太好了。运行。

提前谢谢你，祝你有愉快的一天！

Answer 1

是的，您可以将 multiprocessing 池与星图结合使用。诀窍是创建可迭代对象的可迭代对象，包含所有函数调用的所有参数，如下所示：

import multiprocessing as mp

# Precreate the args
args = [(f'{emodb_path_src}{file}', 512, emodb_path_train) for file in listdir(emodb_path_src)]

with mp.Pool(mp.cpu_count()) as pool:
    print(pool.starmap(sound_proc.windowing, args))

根据 I/O 绑定您的问题的方式，您可能还想尝试 ThreadPool。它使用起来更简单，在某些情况下甚至比 Pool 更快，因为它的设置开销更少。另一个额外的优势是它们共享相同的 GIL，因此您可以访问相同的变量 space。这是一个片段，注意区别只是一个新的导入和 with 语句：

import multiprocessing as mp
from multiprocessing.pool import ThreadPool # thread-based Pool

# Precreate the args
args = [(f'{emodb_path_src}{file}', 512, emodb_path_train) for file in listdir(emodb_path_src)]

# If it's very I/O bound it might even be worthwhile creating more threads than CPU count!
with ThreadPool(mp.cpu_count()) as pool:
    print(pool.starmap(sound_proc.windowing, args))

Answer 2

同时我找到了解决问题的好方法。代码 von Gerard 也很有魅力。我想简要地向您展示一下我现在是如何解决它的，以及使用不同的方法可以节省多少时间。

来自原始 post 的普通 for 循环：

计算时长：~457.7秒

files = [l for l in listdir(emodb_path_src)]

for index, file in enumerate(files):
    print(f'File: {index+1}/{len(files)}')
    sound_proc.windowing(f'{emodb_path_src}{file}', 512, emodb_path_train)

杰拉德提供的解决方案：

计算时长：~75.2秒

import multiprocessing as mp

args = [(f'{emodb_path_src}{file}', 512, emodb_path_train) for file in listdir(emodb_path_src)]
    
with mp.Pool(mp.cpu_count()) as pool:
    pool.starmap(sound_proc.windowing, args)

使用joblib.Parallel:

计算时长：~68.8秒

from joblib import Parallel, delayed
import multiprocessing as mp

Parallel(n_jobs=mp.cpu_count(), backend='multiprocessing')(delayed(sound_proc.windowing)
                                                    (
                                                        sound_file=f'{emodb_path_src}{file}',
                                                        window_size=512,
                                                        dest=emodb_path_train
                                                    ) for file in files)

Python: 如何使用多个参数对我的函数进行多处理？

Python: How to multiprocess my function with multiple arguments?

python

multithreading

multiprocess