如何理解vtk二进制文件格式中的base64编码

How to understand the base64 encoding in vtk binary file format

我对如何理解二进制 DataArray 有疑问。由于base64编码的问题。

手册说如果DataArray的格式是binary,

The data are encoded in base64 and listed contiguously inside the
DataArray element. Data may also be compressed before encoding in base64. The byte-
order of the data matches that specified by the byte_order attribute of the VTKFile element.

我不能完全理解,所以我得到了同型号的ascii文件和二进制文件。

ASCII 文件

<?xml version="1.0"?>
<VTKFile type="UnstructuredGrid" version="0.1" byte_order="LittleEndian" header_type="UInt32" compressor="vtkZLibDataCompressor">
  <UnstructuredGrid>
    <Piece NumberOfPoints="4" NumberOfCells="1">
      <PointData>
      </PointData>
      <CellData>
      </CellData>
      <Points>
        <DataArray type="Float32" Name="Points" NumberOfComponents="3" format="ascii" RangeMin="0" RangeMax="1.4142135624">
          0 0 0 1 0 0
          1 1 0 0 1 1
        </DataArray>
      </Points>
      <Cells>
        <DataArray type="Int64" Name="connectivity" format="ascii" RangeMin="0" RangeMax="3">
          0 1 2 3
        </DataArray>
        <DataArray type="Int64" Name="offsets" format="ascii" RangeMin="4" RangeMax="4">
          4
        </DataArray>
        <DataArray type="UInt8" Name="types" format="ascii" RangeMin="10" RangeMax="10">
          10
        </DataArray>
      </Cells>
    </Piece>
  </UnstructuredGrid>
</VTKFile>

二进制文件

<?xml version="1.0"?>
<VTKFile type="UnstructuredGrid" version="0.1" byte_order="LittleEndian" header_type="UInt32" compressor="vtkZLibDataCompressor">
  <UnstructuredGrid>
    <Piece NumberOfPoints="4" NumberOfCells="1">
      <PointData>
      </PointData>
      <CellData>
      </CellData>
      <Points>
        <DataArray type="Float32" Name="Points" NumberOfComponents="3" format="binary" RangeMin="0" RangeMax="1.4142135624">
          AQAAAACAAAAwAAAAEQAAAA==eJxjYEAGDfaobEw+ADwjA7w=
        </DataArray>
      </Points>
      <Cells>
        <DataArray type="Int64" Name="connectivity" format="binary" RangeMin="0" RangeMax="3">
          AQAAAACAAAAgAAAAEwAAAA==eJxjYIAARijNBKWZoTQAAHAABw==
        </DataArray>
        <DataArray type="Int64" Name="offsets" format="binary" RangeMin="4" RangeMax="4">
          AQAAAACAAAAIAAAACwAAAA==eJxjYYAAAAAoAAU=
        </DataArray>
        <DataArray type="UInt8" Name="types" format="binary" RangeMin="10" RangeMax="10">
          AQAAAACAAAABAAAACQAAAA==eJzjAgAACwAL
        </DataArray>
      </Cells>
    </Piece>
  </UnstructuredGrid>
</VTKFile>

当我查看 DataArray 时,以最后一个为例,我无法创建 AQAAAACAAAABAAAACQAAAA==eJzjAgAACwAL10 之间的关系。

我的理解可以用下面的代码来表达,但是得到了CggAAA==

#include "base64.h" // https://github.com/superwills/NibbleAndAHalf/blob/master/NibbleAndAHalf/base64.h
#include <iostream>
int main()
{
    int x = 10;
    int len;

    // first arg: binary buffer
    // second arg: length of binary buffer
    // third arg: length of ascii buffer
    char *ascii = base64((char *)&x, sizeof(int), &len);
    
    std::cout << ascii << std::endl;
    std::cout << len << std::endl;
    free(ascii);
    return 0;
}

有人可以给我解释一下如何转换吗? 另一个相关话题见

感谢您的宝贵时间。

解决方案可以在讨论中找到。

https://discourse.vtk.org/t/how-to-understand-binary-dataarray-in-xml-vtk-output/4489

长的额外数据来自压缩头。

我找到了解决方案并将答案写在 a VTK support question 中,但我写在这里以防万一有人来这里寻找与我们两个相同的问题。

请注意,我在 Python 中编程,但我相信 C++ 中有 base64zlib 函数。此外,我使用 numpy 来定义数组,但我相信 std::vector 可以等效地用于 C++。

因此,假设我们要在您的示例中编写名为“Points”的单精度 float32 数组。如果我们假设使用 header 类型的“UInt32”,那么在 Python 中,我们会做:

import numpy as np
import zlib
import base64

# write the float array.
arr = np.array([0, 0, 0, 1, 0, 0,
                1, 1, 0, 0, 1, 1], dtype='float32')
# generate a zlib compressed array. This outputs a python byte type
arr_comp = zlib.compress(arr)

# generate the uncompressed header
header = np.array([ 1,  # apparently this is always the case, I think
                2**15,  # from what I have read, this is true in general
           arr.nbytes,  # the size of the array `arr` in bytes
       len(arr_comp)],  # the size of the compressed array
                  dtype='uint32')  # because of header_type="UInt32"

# use base64 encoding when writing to file
# `.decode("utf-8")` transforms the python byte type to a string
print((base64.b64encode(header_arr) + base64.b64encode(arr_comp)).decode("utf-8"))

输出符合预期:

AQAAAACAAAAwAAAAEQAAAA==eJxjYEAGDfaobEw+ADwjA7w=

根据zlib python docs,2**15 是控制压缩数据时使用的历史缓冲区大小(或“window 大小”)的参数。虽然不确定这意味着什么...


编辑:以上代码仅在数组的字节大小小于或等于 2**15 时有效。在 VTK 支持问题中,我针对数组较大的情况进行了扩展。你必须把它分成块。