在进程池之间共享字典和数组

sharing a dictionary and an array among pool of processes

我一直在尝试创建一个字典,它将设备 mac id 作为键和与 mac 在 list.Something 中对应的信息。

{00-00-0A-14-01-06:[['CMTS-51-55_10.20', '10.20.1.1', '342900', 'Cable6/0/0', '110', 'Cable6/0/0-upstream0', '129', 'Cable6/0/0-downstream', '00-00-0A-14-01-06', '10.20.1.6', '11', '1', '1424419744000', '692306', 'SignalingDown', '1', '118800000', '990000', '0', '0', '0', '342900'], 
['CMTS-51-55_10.20', '10.20.1.1', '343800', 'Cable6/0/0', '110', 'Cable6/0/0-upstream0', '129', 'Cable6/0/0-downstream', '00-00-0A-14-01-06', '10.20.1.6', '11', '1', '1424420644000', '692306', 'SignalingDown', '1', '118800000', '990000', '0', '0', '0', '343800'], 
['CMTS-51-55_10.20', '10.20.1.1', '342900', 'Cable6/0/0', '110', 'Cable6/0/0-upstream0', '129', 'Cable6/0/0-downstream', '00-00-0A-14-01-06', '10.20.1.6', '11', '1', '1424419744000', '377773', 'SignalingUp', '2', '118800000', '990000', '0', '0', '0', '342900']]} 

这些数据值是从保存在多个文件夹中的多个文件中检索的。一个文件夹可以有多个文件。

我将此文件夹列表提供给进程池。这样在一个进程中,一个文件夹中的所有文件都会被执行。

我正在维护一个本地字典 (collection.defaultdict) 以用完整的信息填充它,然后将该信息放入我作为参数提供给池对象的共享字典 (manager.dict) 中。

我还提供了一个字符数组,用于在子进程和主进程之间共享一些模板信息。

我正在尝试检查多处理部分中的共享任务,但似乎无法正常工作。

请有人帮助我。

#!/usr/local/bin/pypy

from multiprocessing import Process
from multiprocessing import Pool, Manager ,Value, Array
import collections
from collections import defaultdict
import itertools
import os

def info(title):
    print title
    print 'module name:', __name__
    if hasattr(os, 'getppid'):  # only available on Unix
        print 'parent process:', os.getppid()
    print 'process id:', os.getpid()

def f(template,mydict):
    name = 'bob'
    info('function f')
    resultDeltaArray = collections.defaultdict(list)
    resultDeltaArray['b'].append("hi")
    resultDeltaArray['b'].append("bye")
    resultDeltaArray['c'].append("bye")
    resultDeltaArray['c'].append("bye")
    template = "name"
    print resultDeltaArray
    #print "templaate1", template
    for k,v in resultDeltaArray.viewitems():
        mydict[k] = v
    print 'hello', name
    #mydict = resultDeltaArray
    for k,v in mydict.items():
        print mydict[k]
        #del mydict[k]

if __name__ == '__main__':
    info('main line')
    manager = Manager()
    mydict = manager.dict()
    template = Array('c',50)
    #mydict[''] = []
    #print mydict
    todopool = Pool(2)
    todopool.map_async(f, itertools.repeat(template),itertools.repeat(mydict))
    #print "hi"
    #p = Process(target=f, args=('bob',template,mydict))
    #p.start()
    #p.join()
    print mydict
    mydict.clear()
    print mydict

    print "template2", template

代码用于检查多处理部分。这不是实际的实现。 在这种情况下,它只是在打印后挂起而不做任何事情:

main line
module name: __main__
parent process: 27301
process id: 27852

当我尝试使用 ctrl-C 中断进程时,它在打印后再次卡住

Traceback (most recent call last):
  File "/home/pydev/checkouts/dev/trunk/thirdparty/pypy_2.1/lib-python/2.7/multiprocessing/process.py", line 258, in _bootstrap
  Process PoolWorker-2:
Traceback (most recent call last):
  File "/home/pydev/checkouts/dev/trunk/thirdparty/pypy_2.1/lib-python    /2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/pydev/checkouts/dev/trunk/thirdparty/pypy_2.1/lib-python /2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/pydev/checkouts/dev/trunk/thirdparty/pypy_2.1/lib-python/2.7/multiprocessing/pool.py", line 85, in worker
    self.run()
  File "/home/pydev/checkouts/dev/trunk/thirdparty/pypy_2.1/lib-python/2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/pydev/checkouts/dev/trunk/thirdparty/pypy_2.1/lib-python/2.7/multiprocessing/pool.py", line 85, in worker
    task = get()
  File "/home/pydev/checkouts/dev/trunk/thirdparty/pypy_2.1/lib-python/2.7/multiprocessing/queues.py", line 374, in get
    racquire()
KeyboardInterrupt
    task = get()
  File "/home/pydev/checkouts/dev/trunk/thirdparty/pypy_2.1/lib-python/2.7/multiprocessing/queues.py", line 376, in get
    return recv()

我使用东西的方式是否正确? Pool 对象是否不允许将多处理数组或 manager.dict 作为参数?还有其他方法可以做同样的事情吗?

Dicts(作为内存中的哈希 table 实现)的设计方式并未促进进程之间的共享(进程本质上不共享内存)。

考虑使用具有共享内存的线程,也许使用 from multiprocessing.pool import ThreadPool as Pool。或者使用替代结构,例如 shelve (a persistent, shareable data store). Or use sqlite3 to have multiple processes accessing the same shared database. Of install and use memcached 或其他一些旨在跨进程共享的共享数据存储。

文档还展示了如何使用队列和管道跨进程共享数据,但这可能不是您想要的(共享 key/value 存储):https://docs.python.org/2.7/library/multiprocessing.html#exchanging-objects-between-processes