如何理解vtk二进制文件格式中的base64编码

Question

我对如何理解二进制 DataArray 有疑问。由于base64编码的问题。

手册说如果DataArray的格式是binary,

The data are encoded in base64 and listed contiguously inside the
DataArray element. Data may also be compressed before encoding in base64. The byte-
order of the data matches that specified by the byte_order attribute of the VTKFile element.

我不能完全理解，所以我得到了同型号的ascii文件和二进制文件。

ASCII 文件

<?xml version="1.0"?>
<VTKFile type="UnstructuredGrid" version="0.1" byte_order="LittleEndian" header_type="UInt32" compressor="vtkZLibDataCompressor">
  <UnstructuredGrid>
    <Piece NumberOfPoints="4" NumberOfCells="1">
      <PointData>
      </PointData>
      <CellData>
      </CellData>
      <Points>
        <DataArray type="Float32" Name="Points" NumberOfComponents="3" format="ascii" RangeMin="0" RangeMax="1.4142135624">
          0 0 0 1 0 0
          1 1 0 0 1 1
        </DataArray>
      </Points>
      <Cells>
        <DataArray type="Int64" Name="connectivity" format="ascii" RangeMin="0" RangeMax="3">
          0 1 2 3
        </DataArray>
        <DataArray type="Int64" Name="offsets" format="ascii" RangeMin="4" RangeMax="4">
          4
        </DataArray>
        <DataArray type="UInt8" Name="types" format="ascii" RangeMin="10" RangeMax="10">
          10
        </DataArray>
      </Cells>
    </Piece>
  </UnstructuredGrid>
</VTKFile>

二进制文件

<?xml version="1.0"?>
<VTKFile type="UnstructuredGrid" version="0.1" byte_order="LittleEndian" header_type="UInt32" compressor="vtkZLibDataCompressor">
  <UnstructuredGrid>
    <Piece NumberOfPoints="4" NumberOfCells="1">
      <PointData>
      </PointData>
      <CellData>
      </CellData>
      <Points>
        <DataArray type="Float32" Name="Points" NumberOfComponents="3" format="binary" RangeMin="0" RangeMax="1.4142135624">
          AQAAAACAAAAwAAAAEQAAAA==eJxjYEAGDfaobEw+ADwjA7w=
        </DataArray>
      </Points>
      <Cells>
        <DataArray type="Int64" Name="connectivity" format="binary" RangeMin="0" RangeMax="3">
          AQAAAACAAAAgAAAAEwAAAA==eJxjYIAARijNBKWZoTQAAHAABw==
        </DataArray>
        <DataArray type="Int64" Name="offsets" format="binary" RangeMin="4" RangeMax="4">
          AQAAAACAAAAIAAAACwAAAA==eJxjYYAAAAAoAAU=
        </DataArray>
        <DataArray type="UInt8" Name="types" format="binary" RangeMin="10" RangeMax="10">
          AQAAAACAAAABAAAACQAAAA==eJzjAgAACwAL
        </DataArray>
      </Cells>
    </Piece>
  </UnstructuredGrid>
</VTKFile>

当我查看 DataArray 时，以最后一个为例，我无法创建 AQAAAACAAAABAAAACQAAAA==eJzjAgAACwAL 和 10 之间的关系。

我的理解可以用下面的代码来表达，但是得到了CggAAA==。

#include "base64.h" // https://github.com/superwills/NibbleAndAHalf/blob/master/NibbleAndAHalf/base64.h
#include <iostream>
int main()
{
    int x = 10;
    int len;

    // first arg: binary buffer
    // second arg: length of binary buffer
    // third arg: length of ascii buffer
    char *ascii = base64((char *)&x, sizeof(int), &len);
    
    std::cout << ascii << std::endl;
    std::cout << len << std::endl;
    free(ascii);
    return 0;
}

有人可以给我解释一下如何转换吗？另一个相关话题见

https://discourse.vtk.org/t/error-when-writing-binary-vtk-files/4487/7

感谢您的宝贵时间。

Answer 1

解决方案可以在讨论中找到。

https://discourse.vtk.org/t/how-to-understand-binary-dataarray-in-xml-vtk-output/4489

长的额外数据来自压缩头。

Answer 2

我找到了解决方案并将答案写在 a VTK support question 中，但我写在这里以防万一有人来这里寻找与我们两个相同的问题。

请注意，我在 Python 中编程，但我相信 C++ 中有 base64 和 zlib 函数。此外，我使用 numpy 来定义数组，但我相信 std::vector 可以等效地用于 C++。

因此，假设我们要在您的示例中编写名为“Points”的单精度 float32 数组。如果我们假设使用 header 类型的“UInt32”，那么在 Python 中，我们会做：

import numpy as np
import zlib
import base64

# write the float array.
arr = np.array([0, 0, 0, 1, 0, 0,
                1, 1, 0, 0, 1, 1], dtype='float32')
# generate a zlib compressed array. This outputs a python byte type
arr_comp = zlib.compress(arr)

# generate the uncompressed header
header = np.array([ 1,  # apparently this is always the case, I think
                2**15,  # from what I have read, this is true in general
           arr.nbytes,  # the size of the array `arr` in bytes
       len(arr_comp)],  # the size of the compressed array
                  dtype='uint32')  # because of header_type="UInt32"

# use base64 encoding when writing to file
# `.decode("utf-8")` transforms the python byte type to a string
print((base64.b64encode(header_arr) + base64.b64encode(arr_comp)).decode("utf-8"))

输出符合预期：

AQAAAACAAAAwAAAAEQAAAA==eJxjYEAGDfaobEw+ADwjA7w=

根据zlib python docs，2**15 是控制压缩数据时使用的历史缓冲区大小（或“window 大小”）的参数。虽然不确定这意味着什么...

编辑：以上代码仅在数组的字节大小小于或等于 2**15 时有效。在 VTK 支持问题中，我针对数组较大的情况进行了扩展。你必须把它分成块。

如何理解vtk二进制文件格式中的base64编码

How to understand the base64 encoding in vtk binary file format

binary

base64

vtk

ASCII 文件

二进制文件