调用外部模块时多处理池变慢
Multiprocessing Pool slow when calling external module
我的脚本正在调用 librosa 模块来计算短音频的梅尔频率倒谱系数 (MFCC)。加载音频后,我想尽快计算这些(连同其他一些音频功能)——因此进行了多处理。
问题:多处理变体比顺序变体慢得多。分析表明我的代码 90% 以上的时间都花在 <method 'acquire' of '_thread.lock' objects>
上。如果它是许多小任务,这并不奇怪,但在一个测试用例中,我将我的音频分成 4 个块,然后在单独的进程中处理。我在想开销应该是最小的,但实际上,它几乎和许多小任务一样糟糕。
根据我的理解,multiprocessing 模块应该 fork 几乎所有的东西,不应该有任何锁争夺。然而,结果似乎显示出不同的东西。会不会是下面的 librosa 模块保留了某种内部锁?
我的纯文本分析结果:https://drive.google.com/open?id=17DHfmwtVOJOZVnwIueeoWClUaWkvhTPc
作为图像:https://drive.google.com/open?id=1KuZyo0CurHd9GjXge5CYQhdWn2Q6OG8Z
重现 "problem" 的代码:
import time
import numpy as np
import librosa
from functools import partial
from multiprocessing import Pool
n_proc = 4
y, sr = librosa.load(librosa.util.example_audio_file(), duration=60) # load audio sample
y = np.repeat(y, 10) # repeat signal so that we can get more reliable measurements
sample_len = int(sr * 0.2) # We will compute MFCC for short pieces of audio
def get_mfcc_in_loop(audio, sr, sample_len):
# We split long array into small ones of lenth sample_len
y_windowed = np.array_split(audio, np.arange(sample_len, len(audio), sample_len))
for sample in y_windowed:
mfcc = librosa.feature.mfcc(y=sample, sr=sr)
start = time.time()
get_mfcc_in_loop(y, sr, sample_len)
print('Time single process:', time.time() - start)
# Let's test now feeding these small arrays to pool of 4 workers. Since computing
# MFCCs for these small arrays is fast, I'd expect this to be not that fast
start = time.time()
y_windowed = np.array_split(y, np.arange(sample_len, len(y), sample_len))
with Pool(n_proc) as pool:
func = partial(librosa.feature.mfcc, sr=sr)
result = pool.map(func, y_windowed)
print('Time multiprocessing (many small tasks):', time.time() - start)
# Here we split the audio into 4 chunks and process them separately. This I'd expect
# to be fast and somehow it isn't. What could be the cause? Anything to do about it?
start = time.time()
y_split = np.array_split(y, n_proc)
with Pool(n_proc) as pool:
func = partial(get_mfcc_in_loop, sr=sr, sample_len=sample_len)
result = pool.map(func, y_split)
print('Time multiprocessing (a few large tasks):', time.time() - start)
我机器上的结果:
- 单进程时间:8.48s
- 时间多处理(许多小任务):44.20s
- 时间多处理(几个大任务):41.99s
知道是什么原因造成的吗?更好的是,如何让它变得更好?
为了调查发生了什么,我 运行 top -H
注意到生成了 +60 个线程!就是这样。原来 librosa 和依赖项产生了许多额外的线程,这些线程一起破坏了并行性。
解决方案
超额认购问题在joblib docs中有很好的描述。让我们使用它。
import time
import numpy as np
import librosa
from joblib import Parallel, delayed
n_proc = 4
y, sr = librosa.load(librosa.util.example_audio_file(), duration=60) # load audio sample
y = np.repeat(y, 10) # repeat signal so that we can get more reliable measurements
sample_len = int(sr * 0.2) # We will compute MFCC for short pieces of audio
def get_mfcc_in_loop(audio, sr, sample_len):
# We split long array into small ones of lenth sample_len
y_windowed = np.array_split(audio, np.arange(sample_len, len(audio), sample_len))
for sample in y_windowed:
mfcc = librosa.feature.mfcc(y=sample, sr=sr)
start = time.time()
y_windowed = np.array_split(y, np.arange(sample_len, len(y), sample_len))
Parallel(n_jobs=n_proc, backend='multiprocessing')(delayed(get_mfcc_in_loop)(audio=data, sr=sr, sample_len=sample_len) for data in y_windowed)
print('Time multiprocessing with joblib (many small tasks):', time.time() - start)
y_split = np.array_split(y, n_proc)
start = time.time()
Parallel(n_jobs=n_proc, backend='multiprocessing')(delayed(get_mfcc_in_loop)(audio=data, sr=sr, sample_len=sample_len) for data in y_split)
print('Time multiprocessing with joblib (a few large tasks):', time.time() - start)
结果:
- 使用 joblib 的时间多处理(许多小任务):2.66
- 使用 joblib 进行多处理的时间(一些大型任务):2.65
比使用 multiprocessing 模块快 15 倍。
我的脚本正在调用 librosa 模块来计算短音频的梅尔频率倒谱系数 (MFCC)。加载音频后,我想尽快计算这些(连同其他一些音频功能)——因此进行了多处理。
问题:多处理变体比顺序变体慢得多。分析表明我的代码 90% 以上的时间都花在 <method 'acquire' of '_thread.lock' objects>
上。如果它是许多小任务,这并不奇怪,但在一个测试用例中,我将我的音频分成 4 个块,然后在单独的进程中处理。我在想开销应该是最小的,但实际上,它几乎和许多小任务一样糟糕。
根据我的理解,multiprocessing 模块应该 fork 几乎所有的东西,不应该有任何锁争夺。然而,结果似乎显示出不同的东西。会不会是下面的 librosa 模块保留了某种内部锁?
我的纯文本分析结果:https://drive.google.com/open?id=17DHfmwtVOJOZVnwIueeoWClUaWkvhTPc
作为图像:https://drive.google.com/open?id=1KuZyo0CurHd9GjXge5CYQhdWn2Q6OG8Z
重现 "problem" 的代码:
import time
import numpy as np
import librosa
from functools import partial
from multiprocessing import Pool
n_proc = 4
y, sr = librosa.load(librosa.util.example_audio_file(), duration=60) # load audio sample
y = np.repeat(y, 10) # repeat signal so that we can get more reliable measurements
sample_len = int(sr * 0.2) # We will compute MFCC for short pieces of audio
def get_mfcc_in_loop(audio, sr, sample_len):
# We split long array into small ones of lenth sample_len
y_windowed = np.array_split(audio, np.arange(sample_len, len(audio), sample_len))
for sample in y_windowed:
mfcc = librosa.feature.mfcc(y=sample, sr=sr)
start = time.time()
get_mfcc_in_loop(y, sr, sample_len)
print('Time single process:', time.time() - start)
# Let's test now feeding these small arrays to pool of 4 workers. Since computing
# MFCCs for these small arrays is fast, I'd expect this to be not that fast
start = time.time()
y_windowed = np.array_split(y, np.arange(sample_len, len(y), sample_len))
with Pool(n_proc) as pool:
func = partial(librosa.feature.mfcc, sr=sr)
result = pool.map(func, y_windowed)
print('Time multiprocessing (many small tasks):', time.time() - start)
# Here we split the audio into 4 chunks and process them separately. This I'd expect
# to be fast and somehow it isn't. What could be the cause? Anything to do about it?
start = time.time()
y_split = np.array_split(y, n_proc)
with Pool(n_proc) as pool:
func = partial(get_mfcc_in_loop, sr=sr, sample_len=sample_len)
result = pool.map(func, y_split)
print('Time multiprocessing (a few large tasks):', time.time() - start)
我机器上的结果:
- 单进程时间:8.48s
- 时间多处理(许多小任务):44.20s
- 时间多处理(几个大任务):41.99s
知道是什么原因造成的吗?更好的是,如何让它变得更好?
为了调查发生了什么,我 运行 top -H
注意到生成了 +60 个线程!就是这样。原来 librosa 和依赖项产生了许多额外的线程,这些线程一起破坏了并行性。
解决方案
超额认购问题在joblib docs中有很好的描述。让我们使用它。
import time
import numpy as np
import librosa
from joblib import Parallel, delayed
n_proc = 4
y, sr = librosa.load(librosa.util.example_audio_file(), duration=60) # load audio sample
y = np.repeat(y, 10) # repeat signal so that we can get more reliable measurements
sample_len = int(sr * 0.2) # We will compute MFCC for short pieces of audio
def get_mfcc_in_loop(audio, sr, sample_len):
# We split long array into small ones of lenth sample_len
y_windowed = np.array_split(audio, np.arange(sample_len, len(audio), sample_len))
for sample in y_windowed:
mfcc = librosa.feature.mfcc(y=sample, sr=sr)
start = time.time()
y_windowed = np.array_split(y, np.arange(sample_len, len(y), sample_len))
Parallel(n_jobs=n_proc, backend='multiprocessing')(delayed(get_mfcc_in_loop)(audio=data, sr=sr, sample_len=sample_len) for data in y_windowed)
print('Time multiprocessing with joblib (many small tasks):', time.time() - start)
y_split = np.array_split(y, n_proc)
start = time.time()
Parallel(n_jobs=n_proc, backend='multiprocessing')(delayed(get_mfcc_in_loop)(audio=data, sr=sr, sample_len=sample_len) for data in y_split)
print('Time multiprocessing with joblib (a few large tasks):', time.time() - start)
结果:
- 使用 joblib 的时间多处理(许多小任务):2.66
- 使用 joblib 进行多处理的时间(一些大型任务):2.65
比使用 multiprocessing 模块快 15 倍。