使用 multiprocessing.Manager() 创建共享内存字典太慢

Question

我有一个代码，我需要在其中读取 excel 文件并将信息存储到字典中。

我必须使用 multiprocessing.Manager() 创建字典，以便能够从我运行使用 multiprocess.Process.

的函数中检索计算输出

问题是，当使用 multiprocessing.Manager() 和 manager.dict() 来创建字典时，它比仅使用 dict() 花费大约 400 倍的时间（并且 dict() 是不是共享内存结构）。

下面是验证差异的示例代码：

import xlrd
import multiprocessing
import time

def DictManager(inp1, inp2):
    manager = multiprocessing.Manager()
    Dict = manager.dict()
    Dict['input1'] = inp1
    Dict['input2'] = inp2
    Dict['Output1'] = None
    Dict['Output2'] = None
    return Dict

def DictNoManager(inp1, inp2):
    Dict = dict()
    Dict['input1'] = inp1
    Dict['input2'] = inp2
    Dict['Output1'] = None
    Dict['Output2'] = None
    return Dict

def ReadFileManager(excelfile):
    DictList = []
    book = xlrd.open_workbook(excelfile)
    sheet = book.sheet_by_index(0)
    line = 2
    for line in range(2,sheet.nrows):
        inp1 = sheet.cell(line,2).value
        inp2 = sheet.cell(line,3).value
        dictionary = DictManager(inp1, inp2)
        DictList.append(dictionary)
    print 'Done!'

def ReadFileNoManager(excelfile):
    DictList = []
    book = xlrd.open_workbook(excelfile)
    sheet = book.sheet_by_index(0)
    line = 2
    for line in range(2,sheet.nrows):
        inp1 = sheet.cell(line,2).value
        inp2 = sheet.cell(line,3).value
        dictionary = DictNoManager(inp1, inp2)
        DictList.append(dictionary)
    print 'Done!'


if __name__ == '__main__':
    excelfile = 'MyFile.xlsx'

    start = time.time()
    ReadFileNoManager(excelfile)
    end = time.time()
    print 'Run time NoManager:', end - start, 's'

    start = time.time()
    ReadFileManager(excelfile)
    end = time.time()
    print 'Run time Manager:', end - start, 's'

有没有办法提高 multiprocessing.Manager() 的性能？

如果答案是否定的，是否有任何其他共享内存结构可用于替代我正在做的事情并提高性能？

非常感谢你的帮助！

编辑：

我的主要功能使用以下代码：

def MyFunction(Dictionary, otherdata):
    #Perform calculation and save results in the dictionary
    Dict['Output1'] = Value1
    Dict['Output2'] = Value2

ListOfProcesses = []
for Dict in DictList:
    p = multiprocessing.Process(target=MyFunction, args=(Dict, otherdata)
    p.start()
    ListOfProcesses.append(p)  
for p in ListOfProcesses:
    p.join()

如果我不使用管理器，我将无法检索输出。

Answer 1

正如我在评论中提到的，我建议使用主进程读取excel文件。然后使用 multiprocessing 进行函数调用。只需将您的功能添加到 apply_function 并确保它 returns 无论您想要什么。 results 将包含您的结果列表。

更新：我将地图更改为星图以包含您的额外参数

def ReadFileNoManager(excelfile):
    DictList = []
    book = xlrd.open_workbook(excelfile)
    sheet = book.sheet_by_index(0)
    line = 2
    for line in range(2,sheet.nrows):
        inp1 = sheet.cell(line,2).value
        inp2 = sheet.cell(line,3).value
        dictionary = DictNoManager(inp1, inp2)
        DictList.append(dictionary)
    print 'Done!'
    return DictList

def apply_function(your_dict, otherdata):
    pass

if __name__ == '__main__':
    excelfile = 'MyFile.xlsx'
    dict_list = ReadFileNoManager(excelfile)    
    pool = multiprocessing.Pool(multiprocessing.cpu_count())
    results = pool.starmap(apply_function, zip(dict_list, repeat(otherdata)))

使用 multiprocessing.Manager() 创建共享内存字典太慢

Shared memory dictionary creation too slow using multiprocessing.Manager()

python

performance

dictionary

xlrd

multiprocessing