numpy 数组 header 中有没有任何信息？

Question

有 .bin 扩展名的 360 个文件，我知道它们是 360 个原始图像文件（16 位灰度）。我猜图像的大小约为 1518x999。我很困惑如何从中获取图像数据。我检查了它们，发现在所有文件的开头有 149 个字节重复，在所有文件的结尾有 15 个字节（它们在下图中用白框标记）。这些 header 和页脚在 numpy 数组中是否常见？（我看到 numpy multiarray ... header 字节。见下图）我可以从 header 和页脚中提取一些关于图像规格的信息，例如宽度和高度吗？ Here 是文件的三个示例。

Answer 1

是的。 header 包含有关数组类型和大小的信息。

使用 numpy（和 pillow），您可以轻松检索图像，如下所示。

# Using python 3.6 or higher.
# To install numpy and pillow, run: pip3 install numpy pillow

from pathlib import Path
import numpy as np
from PIL import Image

input_dir = Path("./binFiles")  # Directory where *.bin files are stored.
output_dir = Path("./_out")  # Directory where you want to output the image files.

output_dir.mkdir(parents=True, exist_ok=True)
for path in input_dir.rglob("*.bin"):
    buf = np.load(path, allow_pickle=True)
    image = Image.fromarray(buf)
    image.save(output_dir / (path.stem + ".png"))

这是一个示例。（原png格式上传不了，所以转了一个）

编辑：

问题

header 中是否有比检索到的信息更多的信息？
该页脚中有任何信息吗？

回答

理论上，两个答案都不是。

你的文件其实不是numpy文件格式，而是pickle文件格式的numpyobject。我能够仅使用数据类型、形状、顺序和 3,032,964 (=999x1518x2) 字节的数组来重建完全匹配的文件。因此，numpy 或 pickle 可能添加了额外的元数据，但只有这四个是基本信息（至少对于您提供的三个文件而言）。

如果您想了解“其他元数据”，我没有答案，您可能想问一个更精致的新问题，因为这是关于 pickle 文件格式的。

这是我用来检查的代码，以防您也想检查其他文件。

for input_path in input_dir.rglob("*.bin"):
    # Load the original file.
    numpy_array = np.load(input_path, allow_pickle=True)

    # Convert to a byte array. 'A' means keep the order.
    bytes_array = numpy_array.tobytes('A')

    # Make sure there are no additional bytes other than the image pixels.
    assert len(bytes_array) == numpy_array.size * numpy_array.itemsize

    # Rebuild from byte array.
    # Note that rebuilt_array is constructed using only dtype, shape, order,
    # and a byte array matching the image size.
    rebuilt_array = np.frombuffer(
        bytes_array, dtype=numpy_array.dtype
    ).reshape(
        numpy_array.shape, order='F' if np.isfortran(numpy_array) else 'C'
    )

    # Pickle the rebuilt array (mimicking the original file).
    rebuilt_path = output_dir / (input_path.stem + ".pickle")
    with rebuilt_path.open(mode='wb') as fo:
        pickle.dump(rebuilt_array, fo, protocol=4)

    # Make sure there are no additional bytes other than the rebuilt array.
    assert rebuilt_path.read_bytes() == input_path.read_bytes()

    print(f"{input_path.name} passed!")

numpy 数组 header 中有没有任何信息？

Is there any information in numpy array header?

python

numpy

multidimensional-array

numpy-ndarray

编辑：

问题

回答