使用 Numpy fromfile 和给定的偏移量读取二进制文件

Question

我有一个包含飞机位置记录的二进制文件。每条记录如下：

0x00: Time, float32
0x04: X, float32 // X axis position
0x08: Y, float32 // Y axis position
0x0C: Elevation, float32
0x10: float32*4 = Quaternion (x,y,z axis and w scalar)
0x20: Distance, float32 (unused)

因此每条记录的长度为 32 个字节。

我想要一个 Numpy 数组。

在偏移量 1859 处有一个 unsigned int 32（4 字节），它指示数组元素的数量。 12019 就我而言。

我不关心（暂时）header 数据（偏移量 1859 之前）

数组仅从偏移量 1863 (=1859+4) 开始。

我定义了自己的 Numpy dtype

dtype = np.dtype([
    ("time", np.float32),
    ("PosX", np.float32),
    ("PosY", np.float32),
    ("Alt", np.float32),
    ("Qx", np.float32),
    ("Qy", np.float32),
    ("Qz", np.float32),
    ("Qw", np.float32),
    ("dist", np.float32),
])

我正在使用 fromfile:

读取文件

a_bytes = np.fromfile(filename, dtype=dtype)

但是我没有看到任何参数可以提供给 fromfile 来传递偏移量。

Answer 1

您可以使用标准 python 文件打开文件，然后寻求跳过 header，然后将文件 object 传递给 fromfile。像这样：

import numpy as np
import os

dtype = np.dtype([
    ("time", np.float32),
    ("PosX", np.float32),
    ("PosY", np.float32),
    ("Alt", np.float32),
    ("Qx", np.float32),
    ("Qy", np.float32),
    ("Qz", np.float32),
    ("Qw", np.float32),
    ("dist", np.float32),
])

f = open("myfile", "rb")
f.seek(1863, os.SEEK_SET)

data = np.fromfile(f, dtype=dtype)
print x

Answer 2

我遇到了类似的问题，但上面 none 的答案令我满意。我需要用大量的二进制记录实现类似 virtual table 的东西，这些记录可能占用的内存比我在一个 numpy 数组中所能承受的更多。所以我的问题是如何读写一小组整数 from/to 二进制文件 - 文件的子集到 numpy 数组的子集。

这是一个对我有用的解决方案：

import numpy as np
recordLen = 10 # number of int64's per record
recordSize = recordLen * 8 # size of a record in bytes
memArray = np.zeros(recordLen, dtype=np.int64) # a buffer for 1 record

# Create a binary file and open it for write+read
with open('BinaryFile.dat', 'w+b') as file:
    # Writing the array into the file as record recordNo:
    recordNo = 200 # the index of a target record in the file
    file.seek(recordSize * recordNo)
    bytes = memArray.tobytes()
    file.write(bytes)

    # Reading a record recordNo from file into the memArray
    file.seek(recordSize * recordNo)
    bytes = file.read(recordSize)
    memArray = np.frombuffer(bytes, dtype=np.int64).copy()
    # Note copy() added to make the memArray mutable

Answer 3

我建议使用 numpy frombuffer:

with open(file_path, 'rb') as file_obj:
    file_obj.seek(seek_to_position)
    data_ro = np.frombuffer(file_obj.read(total_num_bytes), dtype=your_dtype_here)
    data_rw = data_ro.copy() #without copy(), the result is read-only

使用 Numpy fromfile 和给定的偏移量读取二进制文件

Read a binary file using Numpy fromfile and a given offset

python

arrays

numpy

binary