将 numpy memmap 刷新到 npy 文件

Question

有没有办法将 numpy 内存映射数组保存到 .npy 文件中？显然，有一种方法可以从 .npy 文件中加载这样的数组，如下所示

data = numpy.load("input.npy", mmap_mode='r')

但是刷新文件并不等同于以 .npy 格式存储文件。

如果刷新是唯一的方法，那么有没有办法推断存储数组的形状？我更喜欢在另一个脚本中自动存储和检索（可能再次作为 memmap）的动态形状。

我在各个地方都搜索过这个，但没有找到任何结果。我的存储方式是 .npy 我现在做的是

numpy.save(output.filename, output.copy())

这打破了使用 memmap 的想法，但保留了形状。

注意：我知道 hdf5 和 h5py，但我想知道是否有一个纯粹的 numpy 解决方案。

Answer 1

is there a way to infer the shape of the stored array?

。就 np.memmap 而言，文件只是一个缓冲区 - 它存储数组的内容，但不存储维度、dtype 等。除非以某种方式包含在数组本身中，否则无法推断该信息。如果您已经创建了一个由简单二进制文件支持的 np.memmap，那么您需要将其内容写入磁盘上的新 .npy 文件。

您可以通过使用 numpy.lib.format.open_memmap:

打开新的 .npy 文件作为另一个内存映射数组来避免在内存中生成副本

import numpy as np
from numpy.lib.format import open_memmap

# a 10GB memory-mapped array
x = np.memmap('/tmp/x.mm', mode='w+', dtype=np.ubyte, shape=(int(1E10),))

# create a memory-mapped .npy file with the same dimensions and dtype
y = open_memmap('/tmp/y.npy', mode='w+', dtype=x.dtype, shape=x.shape)

# copy the array contents
y[:] = x[:]

Answer 2

用 np.save 保存的数组本质上是一个带有 header 指定数据类型、形状和元素顺序的内存映射。您可以在 numpy documentation.

中阅读更多相关信息

当您创建 np.memmap 时，您可以使用 offset 参数为 header 保留 space。 numpy 文档指定 header 长度应该是 64 的倍数：

假设您为 header 预留了 2 * 64 = 128 个字节（更多内容见下文）：

import numpy as np
x = np.memmap('/tmp/x.npy', mode='w+', dtype=np.ubyte, 
              shape=(int(1E10),), offset=128)

然后，当你完成对 memmap 的操作后，你创建并写入 header，使用 np.lib.format:

header = np.lib.format.header_data_from_array_1_0(x)

with open('/tmp/x.npy', 'r+b') as f:
    np.lib.format.write_array_header_1_0(f, header)

请注意，这会从 memmap 文件的开头写入 header，因此如果 len(header) > 128，则会覆盖部分数据，并且您的文件将无法读取。 header 是一个固定长度的魔术字符串（6 字节），两个版本字节，两个指定 header 长度的字节，以及指定 'shape'、[=37= 的字典的字符串表示形式]，以及 'order'。如果您知道数组的形状和 dtype (descr)，则可以轻松计算 header 长度（为了简单起见，我将其固定在上面的 128）。

写入 header 后，您可以使用 np.load:

加载数据

y = np.load('/tmp/x.npy')

如果您保存的内存映射很大，您可能需要再次将数据加载为内存映射：

y = np.load('/tmp/x.npy', mmap_mode='r')

将 numpy memmap 刷新到 npy 文件

Flushing numpy memmap to npy file

python

mmap

numpy

memory-mapped-files