使用 header 从 python 中的 C 中读取二进制数据

Question

我曾使用 C 编写二进制格式的文件。我使用的格式如下：

一个header有5个双打（总共40个字节）：

fwrite(&FirstNum, sizeof(double), 1, outFile);
fwrite(&SecNum, sizeof(double), 1, outFile);
fwrite(&ThirdNum, sizeof(double), 1, outFile);
fwrite(&FourthNum, sizeof(double), 1, outFile);           
fwrite(&FifthNum, sizeof(double), 1, outFile);

然后我执行了一个 for cicle over 256^3 "particles"。对于每个粒子，我写了 9 个值：第一个是整数，另外 8 个是双精度值，如下所示：

Ntot = 256*256*256
for(i=0; i<Ntot; i++ )
  {
    fwrite(&gp[i].GID, sizeof(int), 1, outFile);

    /*----- Positions -----*/
    pos_aux[X] = gp[i].pos[X];
    pos_aux[Y] = gp[i].pos[Y];
    pos_aux[Z] = gp[i].pos[Z];

    fwrite(&pos_aux[0], sizeof(double), 3, outFile);  //Positions in 3D
    fwrite(&gp[i].DenConCell, sizeof(double), 1, outFile); //Density
    fwrite(&gp[i].poten_r[0], sizeof(double), 1, outFile); //Field 1
    fwrite(&gp[i].potDot_r[0], sizeof(double), 1, outFile); //Field 2
    fwrite(&gp[i].potDot_app1[0], sizeof(double), 1, outFile); //Field 3
    fwrite(&gp[i].potDot_app2[0], sizeof(double), 1, outFile); //Field 4
  }

其中gp只是一个包含我的粒子信息的数据结构。然后，对于 256^3 个粒子中的每一个，我总共使用了 68 个字节：4 个字节用于 int + 8*（8 个字节）用于双精度数。

我需要的是阅读这种格式，但在 python 中以便制作一些图，但我对 python 有点陌生。我已经阅读了一些使用 python 读取二进制格式文件的答案，但我只能阅读我的 header，而不是 "body" 或有关的其余信息粒子。我尝试过的是以下内容：

Npart = 256
with open("./path/to/my/binary/file.bin", 'rb') as bdata:
    header_size = 40 # in bytes           
    bheader = bdata.read(40)
    header_data = struct.unpack('ddddd', bheader)
    FirstNum = header_data[0]
    SecNum = header_data[1]
    ThirdNum = header_data[2]
    FourthNum = header_data[3]
    FifthNum = header_data[4]
    #Until here, if I print each number, I obtain the correct values.
    #From here, is what I've tried in order to read the 9 data of the 
    #particles
    bytes_per_part = 68
    body_size = int( (Npart**3) * bytes_per_part )
    body_data_read = bdata.read(body_size)
    #body_data = struct.unpack_from('idddddddd', bdata, offset=40)
    #body_data = struct.unpack('=i 8d', body_data_read) 
    body_data = struct.unpack('<i 8d', body_data_read)

    #+++++ Unpacking data ++++++ 
    ID_us = body_data[0]
    pos_x_us = body_data[1]
    pos_y_us = body_data[2]
    pos_z_us = body_data[3]
    DenCon_us = body_data[4]

但是当我运行我的代码时，我得到这个错误：

body_data = struct.unpack('<i 8d', body_data_read)
struct.error: unpack requires a string argument of length 68

我试过第一行评论：

#body_data = struct.unpack_from('idddddddd', bdata, offset=40)

但错误显示：

struct.error: unpack requires a string argument of length 72

如果我使用

    body_data = struct.unpack('=i 8d', body_data_read)

或行

    body_data = struct.unpack('<i 8d', body_data_read)

我得到了我首先显示的错误：

struct.error: unpack requires a string argument of length 68

确实，我觉得我完全看不懂字符串字符“=”和“<”，因为有了它们我得到了我需要阅读的假定长度，但我无法阅读。我最终需要的是一个名为 pos_x_us 的数组，其中包含 x 中的所有位置，pos_y_us 中的 y 中的位置，pos_z_us 中的 z 中的位置等等。如果你能给我一些关于如何获得我所需要的东西的想法或启示，我将不胜感激。

Answer 1

出现您的问题是因为缓冲区大小与格式不匹配。让我们尝试一些随机数据。总共 12 个字节，用于 int 和 float。

>>> data = '\xf4\x9f\x97\xcd\xf2\xbe\xd6\x87\x18\xe3\x17\xdf'

如果您不使用“<”、“>”、“=”和“!”，则会出现 padding。

Padding is only automatically added between successive structure members. No padding is added at the beginning or the end of the encoded struct.

>>> struct.unpack('id', data)

Traceback (most recent call last):
  File "<pyshell#56>", line 1, in <module>
    struct.unpack('id', data)
error: unpack requires a string argument of length 16

但是

>>> struct.unpack('=id', data)
(-845701132, -1.2217466572589222e+150)

更具体地说，'d' 本身占用 8 个字节，'i' 占用 4 个字节。'iii' 单独占用 12 个字节就可以了，因为它们是同一类型。但是如果你尝试做 'id'，它不会喜欢那样，它会将整数填充到 8 个字节。您可以看到 'c' 占用 1 个字节，但 'ci' 需要 8 个字节。基本上，struct.unpack('ddddd') 因情况而工作正常。

您的其他错误来自格式与缓冲区大小不匹配。如果使用 struct.unpack(), it must match exactly, but if you use struct.unpack_from()，则必须至少具有格式的大小。让我们尝试使用 24 个字节的数据。

# this will fetch 12 bytes, even if the stream has more
>>> struct.unpack_from('=id', 2*data)
(-845701132, -1.2217466572589222e+150)

但是

>>> struct.unpack('=id', 2*data)

Traceback (most recent call last):
  File "<pyshell#60>", line 1, in <module>
    struct.unpack('=id', 2*data)
error: unpack requires a string argument of length 12

如您现在所见，您的数据实际上是

body_size = int( (Npart**3) * bytes_per_part )
body_data_read = bdata.read(body_size)

为了匹配它，您需要 'i8di8di8d...' Npart**3 次的格式。所以，

body_data = struct.unpack('='+(Npart**3)*'i8d', body_data_read)

现在您已经一次读入了所有数据，您可以开始根据需要拆分它们。例如，第二个值具有第一个粒子的 x 坐标，由于此模式每 9 个值重复一次，您可以通过切片获得所有粒子的 x 坐标。

pos_x_us = body_data[1::9]

使用 header 从 python 中的 C 中读取二进制数据

Read binary data with header from C in python

c

python

binaryfiles