mpi4py是否使用Python Pickle

Does mpi4py use Python Pickle

根据mpi4py的解释,不知道是用pickle,还是比pickle效率高。最初,文件指出:

The pickle (slower, written in pure Python) and cPickle (faster, written in C) modules provide user-extensible facilities to serialize generic Python objects using ASCII or binary formats. The marshal module provides facilities to serialize built-in Python objects using a binary format specific to Python, but independent of machine architecture issues.

基于此,pickle 似乎是最慢的方法。然后文件说:

MPI for Python can communicate any built-in or used-defined Python object taking advantage of the features provided by the mod:pickle module.

那么 MPI 使用的是最慢的选项 Pickle 吗?有更多的文字,但我没有看到一个直接的答案,也许它不是直接的实施?我是不是完全误解了它在说什么?

它尽其所能地使用 cPickle,但是 falls back on Pickle if necessary:

if PY_MAJOR_VERSION >= 3:
    from pickle import dumps as PyPickle_dumps
    from pickle import loads as PyPickle_loads
    from pickle import DEFAULT_PROTOCOL as PyPickle_PROTOCOL
else:
    try:
        from cPickle import dumps as PyPickle_dumps
        from cPickle import loads as PyPickle_loads
        from cPickle import HIGHEST_PROTOCOL as PyPickle_PROTOCOL
    except ImportError:
        from pickle  import dumps as PyPickle_dumps
        from pickle  import loads as PyPickle_loads
        from pickle  import HIGHEST_PROTOCOL as PyPickle_PROTOCOL

由于 pympi 包装器基于 MPI-2,对于并行 I/O 我猜它只在调用 MPI 之前使用 pickle 将数据转换为适当的格式(在每个进程上)writing/communication 在幕后运作。你引用的那一行之后说:

Blockquote These facilities will be routinely used to build binary representations of objects to communicate (at sending processes), and restoring them back (at receiving processes).

mpipy 文档建议尽可能使用 numpy 数组而不是 python 数据类型以提高效率。如果您的应用程序对速度至关重要,我建议始终使用 numpy 数组而不是 python 对象。 主要引入了 MPI 2.0 标准以提供高效的并行输入输出功能。使用 MPI(用 c 编写)很可能比 cpickle 或 pickle 更快,尤其是在许多进程上写入输出时。