python multiprocessing 多线程一整段代码

Question

我没有在 python 代码中使用 multiprocessing/multithread。

我的代码很长（超过 600 行），我需要运行使用多个 CPU 来处理它。

于是我看到了mutiprocessing/thread的使用方法，但是整段代码找不到方法

代码的形式为..

for循环
读取 csv
做几个预处理
值的平均值
与其他值比较 ...

如果我必须编辑多处理的所有代码，那将需要很多次，如果你知道如何对整个代码进行多处理，请告诉我好吗？

Answer 1

要在多个 CPU 内核上并行化一个函数，通常必须避免改变全局状态，并且每个函数调用都必须独立于其他函数调用。考虑这个尊重条件的假设函数（删除与其他值的比较）：

def f(file: Path) -> Value:
  data = read_csv(file)
  processed = pre_processing(data)
  return mean(processed)

您可以使用 concurrent 集成包轻松地将其与 Python 进行多线程处理：

from concurrent.futures import ThreadPoolExecutor

files = ["/path/1/", ...]  # List of files

with ThreadPoolExecutor() as executor:
  values = executor.map(f, files)

# Compare values here
for value in values:
  ...

您还可以使用 ProcessPoolExecutor 进行多处理。

python multiprocessing 多线程一整段代码

python multiprocessing multithread a whole code

python

multithreading

multiprocessing