如何正确 select 想要的数据并丢弃二进制文件中不需要的数据

Question

我正在做一个项目，试图将旧的 16 位二进制数据文件转换为 32 位数据文件供以后使用。

直接转换没有问题，但后来我注意到我需要从 data-file 中删除 header 数据。

数据由8206字节长的帧组成，每帧由14字节长的header和4096字节长的数据块组成，根据文件的不同，每个文件中有 70313 或 70312 帧。

我找不到找到所有 header 并将其删除并仅将 data-block 保存到新文件的巧妙方法。

所以这是我所做的：

results_array = np.empty([0,1], np.uint16)

for filename in file_list:
    num_files += 1
    # read data from file as 16bit's and save it as 32bit
    data16 = np.fromfile(data_dir + "/" + filename, dtype=np.uint16)
    filesize = np.prod(data16.shape)
    if filesize == 288494239:
        total_frames = 70313
        #total_frames = 3000
    else:
        total_frames = 70312
        #total_frames = 3000

    frame_count = 0
    chunksize = 4103

    with open(data_dir + "/" + filename, 'rb') as file:
        while frame_count < total_frames:
            frame_count += 1
            read_data = file.read(chunksize)
            if not read_data:
                break
            data = read_data[7:4103]
            results_array = np.append(results_array,data)
            converted = np.frombuffer(results_array, np.uint16)
            print(str(frame_count) + "/" + str(total_frames))

        converted = np.frombuffer(results_array, np.uint16)
        data32 = converted.astype(dtype=np.uint32) * 256

它有效（我认为它至少有效），但它非常非常慢。

所以问题是，有没有一种方法可以更快地执行上述操作，也许是 numpy 中的某些 build-in 函数或其他东西？

提前致谢

Answer 1

终于破解了这个，比最初的方法快 100 倍:)

    data = np.fromfile(read_dir + "/" + file, dtype=np.int16)
    frames = len(data) // 4103 # framelenght

    # Reshape into array such that each row is a frame
    data = np.reshape(data[:frames * 4103], (frames, 4103))

    # Remove headers and convert to int32
    data = data[:, 7:].astype(np.int32) * 256

如何正确 select 想要的数据并丢弃二进制文件中不需要的数据

How to properly select wanted data and discard unwanted data from binary files

binary

numpy

data-conversion

python-3.6