multiprocessing.RawArray 操作

Question

我读到 RawArray 可以在进程之间共享而无需复制，想了解在 Python 中如何实现。

我在 sharedctypes.py, that a RawArray is constructed from a BufferWrapper from heap.py 中看到，然后用 ctypes.memset 无效。

BufferWrapper 由一个 Arena 对象组成，该对象本身由一个 mmap （或 windows 中的 100 mmaps 构建，参见 [=25 中的第 40 行=])

我读到 mmap 系统调用实际上是用来在 Linux/BSD 中分配内存的，而 Python 模块使用 MapViewOfFile 来分配 windows。

mmap 看起来很方便。它似乎可以直接与 mp.pool-

一起工作

from struct import pack
from mmap import mmap

def pack_into_mmap(idx_nums_tup):

    idx, ints_to_pack = idx_nums_tup
    pack_into(str(len(ints_to_pack)) + 'i', shared_mmap, idx*4*total//2 , *ints_to_pack)


if __name__ == '__main__':

    total = 5 * 10**7
    shared_mmap = mmap(-1, total * 4)
    ints_to_pack = range(total)

    pool = Pool()
    pool.map(pack_into_mmap, enumerate((ints_to_pack[:total//2], ints_to_pack[total//2:])))

我的问题是 -

多进程模块如何知道不在进程之间复制基于 mmap 的 RawArray 对象，就像它对 "regular" python 对象所做的那样？

Answer 1

[Python 3.Docs]: multiprocessing - Process-based parallelism serializes / deserializes data exchanged between processes using a proprietary protocol: [Python 3.Docs]: pickle - Python object serialization（以及从这里开始的术语：pickle / unpickle）。

根据[Python 3.Docs]: pickle - object.__getstate__()：

Classes can further influence how their instances are pickled; if the class defines the method __getstate__(), it is called and the returned object is pickled as the contents for the instance, instead of the contents of the instance’s dictionary. If the __getstate__() method is absent, the instance’s __dict__ is pickled as usual.

如 (Win 的变体) Arena.__getstate__, (class chain: sharedctypes.RawArray -> heap.BufferWrapper -> heap.Heap -> heap.Arena)，仅元数据 (name和 size) pickled 对于 Arena 实例，但不是缓冲区本身。

相反，在__setstate__中，缓冲区是基于（上述）元数据构造的。

multiprocessing.RawArray 操作

multiprocessing.RawArray operation

ctypes

mmap

multiprocessing

python-3.x