pathos.ProcessingPool 和 pickle 之间的交互
Interaction between pathos.ProcessingPool and pickle
我有一个需要运行的计算列表。我使用
将它们并行化
from pathos.multiprocessing import ProcessingPool
pool = ProcessingPool(nodes=7)
values = pool.map(helperFunction, someArgs)
helperFunction
确实创建了一个名为 Parameters
的 class,它在 与
相同的文件 中定义
import otherModule
class Parameters(otherModule.Parameters):
...
到目前为止,还不错。 helperFunction
会做一些计算,基于Parameters
对象,改变它的一些属性,最后用pickle
存储。这是执行保存的辅助函数(来自不同模块)的相关摘录:
import pickle
import hashlib
import os
class cacheHelper():
def __init__(self, fileName, attr=[], folder='../cache/'):
self.folder = folder
if len(attr) > 0:
attr = self.attrToName(attr)
else:
attr = ''
self.fileNameNaked = fileName
self.fileName = fileName + attr
def write(self, objects):
with open(self.getFile(), 'wb') as output:
for object in objects:
pickle.dump(object, output, pickle.HIGHEST_PROTOCOL)
当它到达 pickle.dump()
时,它会引发一个很难调试的异常,因为调试器不会进入实际遇到该异常的工作程序。因此,我在转储发生之前创建了一个断点,并手动输入了该命令。这是输出:
>>> pickle.dump(objects[0], output, pickle.HIGHEST_PROTOCOL)
Traceback (most recent call last):
File "/usr/local/anaconda2/envs/myenv2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2885, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-1-4d2cbb7c63d1>", line 1, in <module>
pickle.dump(objects[0], output, pickle.HIGHEST_PROTOCOL)
File "/usr/local/anaconda2/envs/myenv2/lib/python2.7/pickle.py", line 1376, in dump
Pickler(file, protocol).dump(obj)
File "/usr/local/anaconda2/envs/myenv2/lib/python2.7/pickle.py", line 224, in dump
self.save(obj)
File "/usr/local/anaconda2/envs/myenv2/lib/python2.7/pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)
File "/usr/local/anaconda2/envs/myenv2/lib/python2.7/pickle.py", line 396, in save_reduce
save(cls)
File "/usr/local/anaconda2/envs/myenv2/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/local/anaconda2/envs/myenv2/lib/python2.7/site-packages/dill/dill.py", line 1203, in save_type
StockPickler.save_global(pickler, obj)
File "/usr/local/anaconda2/envs/myenv2/lib/python2.7/pickle.py", line 754, in save_global
(obj, module, name))
PicklingError: Can't pickle <class '__main__.Parameters'>: it's not found as __main__.Parameters
奇怪的是,当我不并行化时,这不会发生,即手动循环 helperFunction
。我很确定我打开的是正确的 Parameters
(而不是父 class)。
我知道在没有可重现示例的情况下很难调试,我不希望这部分有任何解决方案。也许更普遍的问题是:
通过另一个模块并行化使用pickle.dump()
的代码时需要注意什么?
直接来自 Python docs。
12.1.4. What can be pickled and unpickled? The following types can be pickled:
- None, True, and False
- integers, floating point numbers, complex
- strings, bytes, bytearrays
- tuples, lists, sets, and
- dictionaries containing only picklable objects functions defined at the top level of a module (using def, not lambda)
- built-in functions defined at the top level of a module
- classes that are defined at the top level of a module
- instances of such classes whose
__dict__
or the result of calling __getstate__()
is picklable (see section Pickling Class Instances for details).
其他所有东西都不能腌制。在你的情况下,虽然根据你的代码摘录很难说,但我相信问题是 class Parameters
没有在模块的顶层定义,因此它的实例不能'不要被腌制。
使用pathos.multiprocessing
(或其积极开发的分支multiprocess
)而不是内置multiprocessing
的全部意义在于避免pickle
,因为有太多的东西后来不能倾倒。 pathos.multiprocessing
和 multiprocess
使用 dill
而不是 pickle
。如果你想调试一个worker,你可以使用trace.
注意 正如 Mike McKerns(multiprocess
的主要贡献者)正确地注意到的那样,有些情况甚至 dill
也无法处理,尽管它在这个问题上很难制定一些普遍的规则。
我有一个需要运行的计算列表。我使用
将它们并行化from pathos.multiprocessing import ProcessingPool
pool = ProcessingPool(nodes=7)
values = pool.map(helperFunction, someArgs)
helperFunction
确实创建了一个名为 Parameters
的 class,它在 与
import otherModule
class Parameters(otherModule.Parameters):
...
到目前为止,还不错。 helperFunction
会做一些计算,基于Parameters
对象,改变它的一些属性,最后用pickle
存储。这是执行保存的辅助函数(来自不同模块)的相关摘录:
import pickle
import hashlib
import os
class cacheHelper():
def __init__(self, fileName, attr=[], folder='../cache/'):
self.folder = folder
if len(attr) > 0:
attr = self.attrToName(attr)
else:
attr = ''
self.fileNameNaked = fileName
self.fileName = fileName + attr
def write(self, objects):
with open(self.getFile(), 'wb') as output:
for object in objects:
pickle.dump(object, output, pickle.HIGHEST_PROTOCOL)
当它到达 pickle.dump()
时,它会引发一个很难调试的异常,因为调试器不会进入实际遇到该异常的工作程序。因此,我在转储发生之前创建了一个断点,并手动输入了该命令。这是输出:
>>> pickle.dump(objects[0], output, pickle.HIGHEST_PROTOCOL)
Traceback (most recent call last):
File "/usr/local/anaconda2/envs/myenv2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2885, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-1-4d2cbb7c63d1>", line 1, in <module>
pickle.dump(objects[0], output, pickle.HIGHEST_PROTOCOL)
File "/usr/local/anaconda2/envs/myenv2/lib/python2.7/pickle.py", line 1376, in dump
Pickler(file, protocol).dump(obj)
File "/usr/local/anaconda2/envs/myenv2/lib/python2.7/pickle.py", line 224, in dump
self.save(obj)
File "/usr/local/anaconda2/envs/myenv2/lib/python2.7/pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)
File "/usr/local/anaconda2/envs/myenv2/lib/python2.7/pickle.py", line 396, in save_reduce
save(cls)
File "/usr/local/anaconda2/envs/myenv2/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/local/anaconda2/envs/myenv2/lib/python2.7/site-packages/dill/dill.py", line 1203, in save_type
StockPickler.save_global(pickler, obj)
File "/usr/local/anaconda2/envs/myenv2/lib/python2.7/pickle.py", line 754, in save_global
(obj, module, name))
PicklingError: Can't pickle <class '__main__.Parameters'>: it's not found as __main__.Parameters
奇怪的是,当我不并行化时,这不会发生,即手动循环 helperFunction
。我很确定我打开的是正确的 Parameters
(而不是父 class)。
我知道在没有可重现示例的情况下很难调试,我不希望这部分有任何解决方案。也许更普遍的问题是:
通过另一个模块并行化使用pickle.dump()
的代码时需要注意什么?
直接来自 Python docs。
12.1.4. What can be pickled and unpickled? The following types can be pickled:
- None, True, and False
- integers, floating point numbers, complex
- strings, bytes, bytearrays
- tuples, lists, sets, and
- dictionaries containing only picklable objects functions defined at the top level of a module (using def, not lambda)
- built-in functions defined at the top level of a module
- classes that are defined at the top level of a module
- instances of such classes whose
__dict__
or the result of calling__getstate__()
is picklable (see section Pickling Class Instances for details).
其他所有东西都不能腌制。在你的情况下,虽然根据你的代码摘录很难说,但我相信问题是 class Parameters
没有在模块的顶层定义,因此它的实例不能'不要被腌制。
使用pathos.multiprocessing
(或其积极开发的分支multiprocess
)而不是内置multiprocessing
的全部意义在于避免pickle
,因为有太多的东西后来不能倾倒。 pathos.multiprocessing
和 multiprocess
使用 dill
而不是 pickle
。如果你想调试一个worker,你可以使用trace.
注意 正如 Mike McKerns(multiprocess
的主要贡献者)正确地注意到的那样,有些情况甚至 dill
也无法处理,尽管它在这个问题上很难制定一些普遍的规则。