在Python中使用concurrent.future实现多线程

Question

我写了一个 python 代码，可以将原始数据 (STM 显微镜) 转换为 png 格式，并且运行在我的 Macbook Pro 上非常完美。

下面是简化的Python代码：

for root, dirs, file in os.walk(path):
    for dir in dirs:
        fpath = path +'/'+ dir
        os.chdir(fpath)
        spaths=savepath +'/'+ dir
        if os.path.exists(spaths) ==False:
           os.mkdir(spaths)

         for files in glob.glob("*.sm4"):
             for file in files:     
                 data_conv (files, file, spaths)

但是 100 个文件需要 30 - 40 分钟。

现在，我想使用多线程技术（使用“concurrent future”库）来减少处理时间。以“Python线程教程”中的 YouTube 视频为例，尝试修改 python 代码。

但是我必须在 executor.map() 方法中传递太多参数，例如“root”、“dirs.”、“file”。我不知道如何进一步解决这个问题。

下面是简化的多线程Python代码

def raw_data (root, dirs, file):
    for dir in dirs:
        fpath = path +'/'+ dir
        os.chdir(fpath)
        spaths=savepath +'/'+ dir
        if os.path.exists(spaths)==False:
            os.mkdir(spaths)

        for files in glob.glob("*.sm4"):
            for file in files:
                data_conv(files, file, spaths)

with concurrent.futures.ThreadPoolExecutor() as executor:
     executor.map(raw_data, root, dirs, file)

NameError: name 'root' is not defined

欢迎任何建议，谢谢。

Answer 1

首先，正如 Iain Shelvington 指出的那样，data_conv 似乎是一个 CPU 密集函数，因此您不会注意到 ThreadPoolExecutor 的改进，使用 ProcessPoolExecutor.其次，您必须将参数传递给函数调用的每个实例，即将参数列表传递给 raw_data。假设 root 和 file 相同并且 dirs 是一个列表：

with concurrent.futures.ProcessPoolExecutor() as executor:
    results = executor.map(raw_data, [root]*len(dirs), dirs, [file]*len(dirs)
    for result in results:
        # Collect you results

作为旁注，您可能会发现使用 pathlib 使用文件系统更令人愉悦，它也是自 Python 3.4[=18 以来内置的=]

Answer 2

感谢 Iain Shelvington 和 Thenoneman 的建议。

Pathlib 确实减少了我在代码中遇到的混乱情况。

“ProcessPoolExecutor”在我的 CPU 密集函数中工作。

  with concurrent.futures.ProcessPoolExecutor() as executor:
        executor.map(raw_data, os.walk(path))

在Python中使用concurrent.future实现多线程

Implementation of multithreading using concurrent.future in Python

python

multithreading

python-multithreading

python-3.x

concurrent.futures