multiprocessing.RawArray 操作

multiprocessing.RawArray operation

我读到 RawArray 可以在进程之间共享而无需复制,想了解在 Python 中如何实现。

我在 sharedctypes.py, that a RawArray is constructed from a BufferWrapper from heap.py 中看到,然后用 ctypes.memset 无效。

BufferWrapper 由一个 Arena 对象组成,该对象本身由一个 mmap (或 windows 中的 100 mmaps 构建,参见 [=25 中的第 40 行=])

我读到 mmap 系统调用实际上是用来在 Linux/BSD 中分配内存的,而 Python 模块使用 MapViewOfFile 来分配 windows。

mmap 看起来很方便。它似乎可以直接与 mp.pool-

一起工作
from struct import pack
from mmap import mmap

def pack_into_mmap(idx_nums_tup):

    idx, ints_to_pack = idx_nums_tup
    pack_into(str(len(ints_to_pack)) + 'i', shared_mmap, idx*4*total//2 , *ints_to_pack)


if __name__ == '__main__':

    total = 5 * 10**7
    shared_mmap = mmap(-1, total * 4)
    ints_to_pack = range(total)

    pool = Pool()
    pool.map(pack_into_mmap, enumerate((ints_to_pack[:total//2], ints_to_pack[total//2:])))

我的问题是 -

多进程模块如何知道不在进程之间复制基于 mmapRawArray 对象,就像它对 "regular" python 对象所做的那样?

[Python 3.Docs]: multiprocessing - Process-based parallelism serializes / deserializes data exchanged between processes using a proprietary protocol: [Python 3.Docs]: pickle - Python object serialization(以及从这里开始的术语:pickle / unpickle)。

根据[Python 3.Docs]: pickle - object.__getstate__()

Classes can further influence how their instances are pickled; if the class defines the method __getstate__(), it is called and the returned object is pickled as the contents for the instance, instead of the contents of the instance’s dictionary. If the __getstate__() method is absent, the instance’s __dict__ is pickled as usual.

如 (Win 的变体) Arena.__getstate__, (class chain: sharedctypes.RawArray -> heap.BufferWrapper -> heap.Heap -> heap.Arena),仅元数据 (namesize) pickled 对于 Arena 实例,但不是缓冲区本身。

相反,在__setstate__中,缓冲区是基于(上述)元数据构造的。