使用numpy.fromfile读取分散的二进制数据

Question

我想使用 numpy.fromfile 的单个调用来读取二进制文件中的不同块。每个块具有以下格式：

OES=[
('EKEY','i4',1), 
('FD1','f4',1),
('EX1','f4',1),
('EY1','f4',1),
('EXY1','f4',1),
('EA1','f4',1),
('EMJRP1','f4',1),
('EMNRP1','f4',1),
('EMAX1','f4',1),
('FD2','f4',1),
('EX2','f4',1),
('EY2','f4',1),
('EXY2','f4',1),
('EA2','f4',1),
('EMJRP2','f4',1),
('EMNRP2','f4',1),
('EMAX2','f4',1)]

二进制格式如下：

 Data I want (OES format repeating n times)
 ------------------------
 Useless Data
 ------------------------
 Data I want (OES format repeating m times)
 ------------------------
 etc..

我知道我想要的数据和无用数据之间的字节增量。我也知道我想要的每个数据块的大小。

到目前为止，我已经通过查找文件对象 f 然后调用来实现我的目标：

nparr = np.fromfile(f,dtype=OES,count=size)

所以我对每个想要的数据块都有一个不同的 nparr，并将所有 numpy 数组连接成一个新数组。

我的目标是拥有一个包含我想要的所有块的单个数组，而无需连接（出于内存目的）。也就是说，我只想调用 nparr = np.fromfile(f,dtype=OES) 一次。有没有办法实现这个目标？

Answer 1

That is, I want to call nparr = np.fromfile(f,dtype=OES) only once. Is there a way to accomplish this goal?

不，一次调用 fromfile()。

但是如果你事先知道文件的完整布局，你可以预先分配数组，然后使用fromfile和seek将OES块直接读入预先分配的数组中。例如，假设您知道每个 OES 块的文件位置，并且知道每个块中的记录数。也就是说，你知道：

file_positions = [position1, position2, ...]
numrecords = [n1, n2, ...]

然后你可以这样做（假设 f 是已经打开的文件）：

total = sum(numrecords)
nparr = np.empty(total, dtype=OES)
current_index = 0
for pos, n in zip(file_positions, numrecords):
    f.seek(pos)
    nparr[current_index:current_index+n] = np.fromfile(f, count=n, dtype=OES)
    current_index += n

使用numpy.fromfile读取分散的二进制数据

Using numpy.fromfile to read scattered binary data

python

binary

records

numpy