multiprocessing.RawArray 操作
multiprocessing.RawArray operation
我读到 RawArray
可以在进程之间共享而无需复制,想了解在 Python 中如何实现。
我在 sharedctypes.py, that a RawArray
is constructed from a BufferWrapper
from heap.py 中看到,然后用 ctypes.memset
无效。
BufferWrapper
由一个 Arena
对象组成,该对象本身由一个 mmap
(或 windows 中的 100 mmaps 构建,参见 [=25 中的第 40 行=])
我读到 mmap
系统调用实际上是用来在 Linux/BSD 中分配内存的,而 Python 模块使用 MapViewOfFile 来分配 windows。
mmap
看起来很方便。它似乎可以直接与 mp.pool
-
一起工作
from struct import pack
from mmap import mmap
def pack_into_mmap(idx_nums_tup):
idx, ints_to_pack = idx_nums_tup
pack_into(str(len(ints_to_pack)) + 'i', shared_mmap, idx*4*total//2 , *ints_to_pack)
if __name__ == '__main__':
total = 5 * 10**7
shared_mmap = mmap(-1, total * 4)
ints_to_pack = range(total)
pool = Pool()
pool.map(pack_into_mmap, enumerate((ints_to_pack[:total//2], ints_to_pack[total//2:])))
我的问题是 -
多进程模块如何知道不在进程之间复制基于 mmap
的 RawArray
对象,就像它对 "regular" python 对象所做的那样?
[Python 3.Docs]: multiprocessing - Process-based parallelism serializes / deserializes data exchanged between processes using a proprietary protocol: [Python 3.Docs]: pickle - Python object serialization(以及从这里开始的术语:pickle / unpickle)。
根据[Python 3.Docs]: pickle - object.__getstate__():
Classes can further influence how their instances are pickled; if the class defines the method __getstate__(), it is called and the returned object is pickled as the contents for the instance, instead of the contents of the instance’s dictionary. If the __getstate__() method is absent, the instance’s __dict__ is pickled as usual.
如 (Win 的变体) Arena.__getstate__, (class chain: sharedctypes.RawArray -> heap.BufferWrapper -> heap.Heap -> heap.Arena),仅元数据 (name和 size) pickled 对于 Arena 实例,但不是缓冲区本身。
相反,在__setstate__中,缓冲区是基于(上述)元数据构造的。
我读到 RawArray
可以在进程之间共享而无需复制,想了解在 Python 中如何实现。
我在 sharedctypes.py, that a RawArray
is constructed from a BufferWrapper
from heap.py 中看到,然后用 ctypes.memset
无效。
BufferWrapper
由一个 Arena
对象组成,该对象本身由一个 mmap
(或 windows 中的 100 mmaps 构建,参见 [=25 中的第 40 行=])
我读到 mmap
系统调用实际上是用来在 Linux/BSD 中分配内存的,而 Python 模块使用 MapViewOfFile 来分配 windows。
mmap
看起来很方便。它似乎可以直接与 mp.pool
-
from struct import pack
from mmap import mmap
def pack_into_mmap(idx_nums_tup):
idx, ints_to_pack = idx_nums_tup
pack_into(str(len(ints_to_pack)) + 'i', shared_mmap, idx*4*total//2 , *ints_to_pack)
if __name__ == '__main__':
total = 5 * 10**7
shared_mmap = mmap(-1, total * 4)
ints_to_pack = range(total)
pool = Pool()
pool.map(pack_into_mmap, enumerate((ints_to_pack[:total//2], ints_to_pack[total//2:])))
我的问题是 -
多进程模块如何知道不在进程之间复制基于 mmap
的 RawArray
对象,就像它对 "regular" python 对象所做的那样?
[Python 3.Docs]: multiprocessing - Process-based parallelism serializes / deserializes data exchanged between processes using a proprietary protocol: [Python 3.Docs]: pickle - Python object serialization(以及从这里开始的术语:pickle / unpickle)。
根据[Python 3.Docs]: pickle - object.__getstate__():
Classes can further influence how their instances are pickled; if the class defines the method __getstate__(), it is called and the returned object is pickled as the contents for the instance, instead of the contents of the instance’s dictionary. If the __getstate__() method is absent, the instance’s __dict__ is pickled as usual.
如 (Win 的变体) Arena.__getstate__, (class chain: sharedctypes.RawArray -> heap.BufferWrapper -> heap.Heap -> heap.Arena),仅元数据 (name和 size) pickled 对于 Arena 实例,但不是缓冲区本身。
相反,在__setstate__中,缓冲区是基于(上述)元数据构造的。