使用 Python 共享内存的分段错误
Segmentation Fault using Python Shared Memory
函数 store_in_shm
将一个 numpy 数组写入共享内存,而第二个函数 read_from_shm
使用同一共享内存 space 和 [=33= 中的数据创建一个 numpy 数组] numpy 数组。
然而,运行 Python 3.8 中的代码给出了以下分段错误:
zsh: segmentation fault python foo.py
为什么在函数内部访问numpy数组没有问题read_from_shm
,但在函数外部再次访问numpy数组时出现分段错误?
输出:
From read_from_shm(): [0 1 2 3 4 5 6 7 8 9]
zsh: segmentation fault python foo.py
% /Users/athena/opt/anaconda3/envs/test/lib/python3.8/multiprocessing/resource_tracker.py:203: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
foo.py
import numpy as np
from multiprocessing import shared_memory
def store_in_shm(data):
shm = shared_memory.SharedMemory(name='foo', create=True, size=data.nbytes)
shmData = np.ndarray(data.shape, dtype=data.dtype, buffer=shm.buf)
shmData[:] = data[:]
shm.close()
return shm
def read_from_shm(shape, dtype):
shm = shared_memory.SharedMemory(name='foo', create=False)
shmData = np.ndarray(shape, dtype, buffer=shm.buf)
print('From read_from_shm():', shmData)
return shmData
if __name__ == '__main__':
data = np.arange(10)
shm = store_in_shm(data)
shmData = read_from_shm(data.shape, data.dtype)
print('From __main__:', shmData) # no seg fault if we comment this line
shm.unlink()
基本上问题似乎是当函数 returns。然后 shmData
引用它,这是你得到段错误的地方(引用一个关闭的 mmap)这似乎是一个 known bug,但它可以通过保持对 [=11 的引用来解决=].
此外,所有 SharedMemory
个实例都希望被 close()
处理,其中一个在不再需要时被 unlink()
处理。如果您自己不调用 shm.close()
,它将在 GC 上调用,如前所述,并且在 Windows 如果它是当前唯一“打开”的共享内存文件将被删除。当您在 store_in_shm
中调用 shm.close()
时,您引入了 OS 依赖项,因为 windows 数据将被删除,而 MacOS 和 Linux , 它将保留直到 unlink
被调用。
最后,虽然这没有出现在您的代码中,但 another problem 目前存在从独立进程(而不是子进程)访问数据的情况,同样可以过早地删除底层 mmap。 SharedMemory
是一个非常新的库,希望所有问题都能很快解决。
您可以 re-write 给定示例以保留对“第二个”的引用 shm
并仅使用其中一个 unlink
:
import numpy as np
from multiprocessing import shared_memory
def store_in_shm(data):
shm = shared_memory.SharedMemory(name='foo', create=True, size=data.nbytes)
shmData = np.ndarray(data.shape, dtype=data.dtype, buffer=shm.buf)
shmData[:] = data[:]
#there must always be at least one `SharedMemory` object open for it to not
# be destroyed on Windows, so we won't `shm.close()` inside the function,
# but rather after we're done with everything.
return shm
def read_from_shm(shape, dtype):
shm = shared_memory.SharedMemory(name='foo', create=False)
shmData = np.ndarray(shape, dtype, buffer=shm.buf)
print('From read_from_shm():', shmData)
return shm, shmData #we need to keep a reference of shm both so we don't
# segfault on shmData and so we can `close()` it.
if __name__ == '__main__':
data = np.arange(10)
shm1 = store_in_shm(data)
#This is where the *Windows* previously reclaimed the memory resulting in a
# FileNotFoundError because the tempory mmap'ed file had been released.
shm2, shmData = read_from_shm(data.shape, data.dtype)
print('From __main__:', shmData)
shm1.close()
shm2.close()
#on windows "unlink" happens automatically here whether you call `unlink()` or not.
shm2.unlink() #either shm1 or shm2
函数 store_in_shm
将一个 numpy 数组写入共享内存,而第二个函数 read_from_shm
使用同一共享内存 space 和 [=33= 中的数据创建一个 numpy 数组] numpy 数组。
然而,运行 Python 3.8 中的代码给出了以下分段错误:
zsh: segmentation fault python foo.py
为什么在函数内部访问numpy数组没有问题read_from_shm
,但在函数外部再次访问numpy数组时出现分段错误?
输出:
From read_from_shm(): [0 1 2 3 4 5 6 7 8 9]
zsh: segmentation fault python foo.py
% /Users/athena/opt/anaconda3/envs/test/lib/python3.8/multiprocessing/resource_tracker.py:203: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
foo.py
import numpy as np
from multiprocessing import shared_memory
def store_in_shm(data):
shm = shared_memory.SharedMemory(name='foo', create=True, size=data.nbytes)
shmData = np.ndarray(data.shape, dtype=data.dtype, buffer=shm.buf)
shmData[:] = data[:]
shm.close()
return shm
def read_from_shm(shape, dtype):
shm = shared_memory.SharedMemory(name='foo', create=False)
shmData = np.ndarray(shape, dtype, buffer=shm.buf)
print('From read_from_shm():', shmData)
return shmData
if __name__ == '__main__':
data = np.arange(10)
shm = store_in_shm(data)
shmData = read_from_shm(data.shape, data.dtype)
print('From __main__:', shmData) # no seg fault if we comment this line
shm.unlink()
基本上问题似乎是当函数 returns。然后 shmData
引用它,这是你得到段错误的地方(引用一个关闭的 mmap)这似乎是一个 known bug,但它可以通过保持对 [=11 的引用来解决=].
此外,所有 SharedMemory
个实例都希望被 close()
处理,其中一个在不再需要时被 unlink()
处理。如果您自己不调用 shm.close()
,它将在 GC 上调用,如前所述,并且在 Windows 如果它是当前唯一“打开”的共享内存文件将被删除。当您在 store_in_shm
中调用 shm.close()
时,您引入了 OS 依赖项,因为 windows 数据将被删除,而 MacOS 和 Linux , 它将保留直到 unlink
被调用。
最后,虽然这没有出现在您的代码中,但 another problem 目前存在从独立进程(而不是子进程)访问数据的情况,同样可以过早地删除底层 mmap。 SharedMemory
是一个非常新的库,希望所有问题都能很快解决。
您可以 re-write 给定示例以保留对“第二个”的引用 shm
并仅使用其中一个 unlink
:
import numpy as np
from multiprocessing import shared_memory
def store_in_shm(data):
shm = shared_memory.SharedMemory(name='foo', create=True, size=data.nbytes)
shmData = np.ndarray(data.shape, dtype=data.dtype, buffer=shm.buf)
shmData[:] = data[:]
#there must always be at least one `SharedMemory` object open for it to not
# be destroyed on Windows, so we won't `shm.close()` inside the function,
# but rather after we're done with everything.
return shm
def read_from_shm(shape, dtype):
shm = shared_memory.SharedMemory(name='foo', create=False)
shmData = np.ndarray(shape, dtype, buffer=shm.buf)
print('From read_from_shm():', shmData)
return shm, shmData #we need to keep a reference of shm both so we don't
# segfault on shmData and so we can `close()` it.
if __name__ == '__main__':
data = np.arange(10)
shm1 = store_in_shm(data)
#This is where the *Windows* previously reclaimed the memory resulting in a
# FileNotFoundError because the tempory mmap'ed file had been released.
shm2, shmData = read_from_shm(data.shape, data.dtype)
print('From __main__:', shmData)
shm1.close()
shm2.close()
#on windows "unlink" happens automatically here whether you call `unlink()` or not.
shm2.unlink() #either shm1 or shm2