使用 Python 共享内存的分段错误

Segmentation Fault using Python Shared Memory

函数 store_in_shm 将一个 numpy 数组写入共享内存,而第二个函数 read_from_shm 使用同一共享内存 space 和 [=33= 中的数据创建一个 numpy 数组] numpy 数组。

然而,运行 Python 3.8 中的代码给出了以下分段错误:

zsh: segmentation fault python foo.py

为什么在函数内部访问numpy数组没有问题read_from_shm,但在函数外部再次访问numpy数组时出现分段错误?

输出:

From read_from_shm(): [0 1 2 3 4 5 6 7 8 9]
zsh: segmentation fault  python foo.py
% /Users/athena/opt/anaconda3/envs/test/lib/python3.8/multiprocessing/resource_tracker.py:203: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

foo.py

import numpy as np
from multiprocessing import shared_memory

def store_in_shm(data):
    shm = shared_memory.SharedMemory(name='foo', create=True, size=data.nbytes)
    shmData = np.ndarray(data.shape, dtype=data.dtype, buffer=shm.buf)
    shmData[:] = data[:]
    shm.close()
    return shm

def read_from_shm(shape, dtype):
    shm = shared_memory.SharedMemory(name='foo', create=False)
    shmData = np.ndarray(shape, dtype, buffer=shm.buf)
    print('From read_from_shm():', shmData)
    return shmData

if __name__ == '__main__':
    data = np.arange(10)
    shm = store_in_shm(data)
    shmData = read_from_shm(data.shape, data.dtype)
    print('From __main__:', shmData)    # no seg fault if we comment this line
    shm.unlink()

基本上问题似乎是当函数 returns。然后 shmData 引用它,这是你得到段错误的地方(引用一个关闭的 mmap)这似乎是一个 known bug,但它可以通过保持对 [=11 的引用来解决=].

此外,所有 SharedMemory 个实例都希望被 close() 处理,其中一个在不再需要时被 unlink() 处理。如果您自己不调用 shm.close(),它将在 GC 上调用,如前所述,并且在 Windows 如果它是当前唯一“打开”的共享内存文件将被删除。当您在 store_in_shm 中调用 shm.close() 时,您引入了 OS 依赖项,因为 windows 数据将被删除,而 MacOS 和 Linux , 它将保留直到 unlink 被调用。

最后,虽然这没有出现在您的代码中,但 another problem 目前存在从独立进程(而不是子进程)访问数据的情况,同样可以过早地删除底层 mmap。 SharedMemory 是一个非常新的库,希望所有问题都能很快解决。

您可以 re-write 给定示例以保留对“第二个”的引用 shm 并仅使用其中一个 unlink:

import numpy as np
from multiprocessing import shared_memory

def store_in_shm(data):
    shm = shared_memory.SharedMemory(name='foo', create=True, size=data.nbytes)
    shmData = np.ndarray(data.shape, dtype=data.dtype, buffer=shm.buf)
    shmData[:] = data[:]
    #there must always be at least one `SharedMemory` object open for it to not
    #  be destroyed on Windows, so we won't `shm.close()` inside the function,
    #  but rather after we're done with everything.
    return shm

def read_from_shm(shape, dtype):
    shm = shared_memory.SharedMemory(name='foo', create=False)
    shmData = np.ndarray(shape, dtype, buffer=shm.buf)
    print('From read_from_shm():', shmData)
    return shm, shmData #we need to keep a reference of shm both so we don't
                        #  segfault on shmData and so we can `close()` it.

if __name__ == '__main__':
    data = np.arange(10)
    shm1 = store_in_shm(data)
    #This is where the *Windows* previously reclaimed the memory resulting in a 
    #  FileNotFoundError because the tempory mmap'ed file had been released.
    shm2, shmData = read_from_shm(data.shape, data.dtype)
    print('From __main__:', shmData)
    shm1.close() 
    shm2.close()
    #on windows "unlink" happens automatically here whether you call `unlink()` or not.
    shm2.unlink() #either shm1 or shm2