在 Python 中使用 multiprocessing.Pool 和返回自定义对象的函数

Using multiprocessing.Pool in Python with a function returning custom object

我正在使用 multiprocessing.Pool 来加速计算,因为我多次调用一个函数,然后整理结果。这是我的代码片段:

import multiprocessing
from functools import partial

def Foo(id:int,constant_arg1:str, constant_arg2:str):
    custom_class_obj = CustomClass(constant_arg1, constant_arg2)
    custom_class_obj.run() # this changes some attributes of the custom_class_obj
    
    if(something):
       return None
    else:
       return [custom_class_obj]



def parallel_run(iters:int, a:str, b:str):
  pool = multiprocessing.Pool(processes=k)

  ## create the partial function obj before passing it to pool
  partial_func = partial(Foo, constant_arg1=a, constant_arg2=b)

  ## create the variable id list
  iter_list = list(range(iters))
  all_runs = pool.map(partial_func, iter_list)
 
  return all_runs

这会在多处理模块中引发以下错误:

multiprocessing.pool.MaybeEncodingError: Error sending result: '[[<CustomClass object at 0x1693c7070>], [<CustomClass object at 0x1693b88e0>], ....]'
Reason: 'TypeError("cannot pickle 'module' object")'

我该如何解决这个问题?

我能够用一个不可腌制的最小示例复制错误消息 class。该错误基本上表明您的 class 实例无法被腌制,因为它包含对模块的引用,并且模块不可腌制。您需要梳理 CustomClass 以确保实例不包含打开的文件句柄、模块引用等内容。如果您需要这些内容,则应使用 __getstate____setstate__customize the pickle and unpickle process.

您的错误的提炼示例:

from multiprocessing import Pool
from functools import partial

class klass:
    def __init__(self, a):
        self.value = a
        import os
        self.module = os #this fails: can't pickle a module and send it back to main process

def foo(a, b, c):
    return klass(a+b+c)

if __name__ == "__main__":
    with Pool() as p:
        a = 1
        b = 2
        bar = partial(foo, a, b)
        res = p.map(bar, range(10))
    print([r.value for r in res])