Python 多处理过滤数组

Question

现在我正在使用

过滤数组

arr = [a for a in tqdm(replays) if check(a)]

然而，对于数十万个元素，这会花费很多时间。我想知道是否可以通过多处理来做到这一点，最好是以一种漂亮而紧凑的 pythonic 方式。

谢谢！

Answer 1

我在尝试对数十万个元素进行分组时遇到了同样的问题，解决方案是使用 https://docs.python.org/3/library/itertools.html

性能提高了很多，但是当 sorting/grouping/filtering 内存中的大集合

时，看起来 python 有一些问题

Answer 2

定义一个multiprocessing-using并行过滤函数pfilter:

from multiprocessing import Pool

def pfilter(filter_func, arr, cores):
    with Pool(cores) as p:
        booleans = p.map(filter_func, arr)
        return [x for x, b in zip(arr, booleans) if b]

async表示元素之间的执行顺序是真正相互独立的。

您的使用情况是 (4 cpus):

arr = pfilter(check, tqdm(replays), 4)

然而，由于一些奇怪的原因，filter_func 不允许是 lambda 表达式或定义为一个 ...

Answer 3

concurrent.futures 模块为多线程和多进程操作提供了一个很好的接口。

def check(a):
  return (a % 2 == 0)

if __name__ == "__main__":
  array = [1,2,3,4,5]

  from concurrent.futures import ProcessPoolExecutor
  with ProcessPoolExecutor(max_workers=3) as ppe:
    res = [a for a, flg in zip(array, ppe.map(check, array)) if flg]
  print(res)

# [2,4]

Python 多处理过滤数组

Python filter array with multiprocessing

python

parallel-processing

filter

multiprocessing

python-multithreading