如何理解vtk二进制文件格式中的base64编码
How to understand the base64 encoding in vtk binary file format
我对如何理解二进制 DataArray
有疑问。由于base64编码的问题。
手册说如果DataArray
的格式是binary
,
The data are encoded in base64 and listed contiguously inside the
DataArray element. Data may also be compressed before encoding in base64. The byte-
order of the data matches that specified by the byte_order attribute of the VTKFile element.
我不能完全理解,所以我得到了同型号的ascii文件和二进制文件。
ASCII 文件
<?xml version="1.0"?>
<VTKFile type="UnstructuredGrid" version="0.1" byte_order="LittleEndian" header_type="UInt32" compressor="vtkZLibDataCompressor">
<UnstructuredGrid>
<Piece NumberOfPoints="4" NumberOfCells="1">
<PointData>
</PointData>
<CellData>
</CellData>
<Points>
<DataArray type="Float32" Name="Points" NumberOfComponents="3" format="ascii" RangeMin="0" RangeMax="1.4142135624">
0 0 0 1 0 0
1 1 0 0 1 1
</DataArray>
</Points>
<Cells>
<DataArray type="Int64" Name="connectivity" format="ascii" RangeMin="0" RangeMax="3">
0 1 2 3
</DataArray>
<DataArray type="Int64" Name="offsets" format="ascii" RangeMin="4" RangeMax="4">
4
</DataArray>
<DataArray type="UInt8" Name="types" format="ascii" RangeMin="10" RangeMax="10">
10
</DataArray>
</Cells>
</Piece>
</UnstructuredGrid>
</VTKFile>
二进制文件
<?xml version="1.0"?>
<VTKFile type="UnstructuredGrid" version="0.1" byte_order="LittleEndian" header_type="UInt32" compressor="vtkZLibDataCompressor">
<UnstructuredGrid>
<Piece NumberOfPoints="4" NumberOfCells="1">
<PointData>
</PointData>
<CellData>
</CellData>
<Points>
<DataArray type="Float32" Name="Points" NumberOfComponents="3" format="binary" RangeMin="0" RangeMax="1.4142135624">
AQAAAACAAAAwAAAAEQAAAA==eJxjYEAGDfaobEw+ADwjA7w=
</DataArray>
</Points>
<Cells>
<DataArray type="Int64" Name="connectivity" format="binary" RangeMin="0" RangeMax="3">
AQAAAACAAAAgAAAAEwAAAA==eJxjYIAARijNBKWZoTQAAHAABw==
</DataArray>
<DataArray type="Int64" Name="offsets" format="binary" RangeMin="4" RangeMax="4">
AQAAAACAAAAIAAAACwAAAA==eJxjYYAAAAAoAAU=
</DataArray>
<DataArray type="UInt8" Name="types" format="binary" RangeMin="10" RangeMax="10">
AQAAAACAAAABAAAACQAAAA==eJzjAgAACwAL
</DataArray>
</Cells>
</Piece>
</UnstructuredGrid>
</VTKFile>
当我查看 DataArray 时,以最后一个为例,我无法创建 AQAAAACAAAABAAAACQAAAA==eJzjAgAACwAL
和 10
之间的关系。
我的理解可以用下面的代码来表达,但是得到了CggAAA==
。
#include "base64.h" // https://github.com/superwills/NibbleAndAHalf/blob/master/NibbleAndAHalf/base64.h
#include <iostream>
int main()
{
int x = 10;
int len;
// first arg: binary buffer
// second arg: length of binary buffer
// third arg: length of ascii buffer
char *ascii = base64((char *)&x, sizeof(int), &len);
std::cout << ascii << std::endl;
std::cout << len << std::endl;
free(ascii);
return 0;
}
有人可以给我解释一下如何转换吗?
另一个相关话题见
感谢您的宝贵时间。
解决方案可以在讨论中找到。
https://discourse.vtk.org/t/how-to-understand-binary-dataarray-in-xml-vtk-output/4489
长的额外数据来自压缩头。
我找到了解决方案并将答案写在 a VTK support question 中,但我写在这里以防万一有人来这里寻找与我们两个相同的问题。
请注意,我在 Python 中编程,但我相信 C++ 中有 base64
和 zlib
函数。此外,我使用 numpy
来定义数组,但我相信 std::vector
可以等效地用于 C++。
因此,假设我们要在您的示例中编写名为“Points”的单精度 float32 数组。如果我们假设使用 header 类型的“UInt32”,那么在 Python 中,我们会做:
import numpy as np
import zlib
import base64
# write the float array.
arr = np.array([0, 0, 0, 1, 0, 0,
1, 1, 0, 0, 1, 1], dtype='float32')
# generate a zlib compressed array. This outputs a python byte type
arr_comp = zlib.compress(arr)
# generate the uncompressed header
header = np.array([ 1, # apparently this is always the case, I think
2**15, # from what I have read, this is true in general
arr.nbytes, # the size of the array `arr` in bytes
len(arr_comp)], # the size of the compressed array
dtype='uint32') # because of header_type="UInt32"
# use base64 encoding when writing to file
# `.decode("utf-8")` transforms the python byte type to a string
print((base64.b64encode(header_arr) + base64.b64encode(arr_comp)).decode("utf-8"))
输出符合预期:
AQAAAACAAAAwAAAAEQAAAA==eJxjYEAGDfaobEw+ADwjA7w=
根据zlib python docs,2**15 是控制压缩数据时使用的历史缓冲区大小(或“window 大小”)的参数。虽然不确定这意味着什么...
编辑:以上代码仅在数组的字节大小小于或等于 2**15
时有效。在 VTK 支持问题中,我针对数组较大的情况进行了扩展。你必须把它分成块。
我对如何理解二进制 DataArray
有疑问。由于base64编码的问题。
手册说如果DataArray
的格式是binary
,
The data are encoded in base64 and listed contiguously inside the
DataArray element. Data may also be compressed before encoding in base64. The byte-
order of the data matches that specified by the byte_order attribute of the VTKFile element.
我不能完全理解,所以我得到了同型号的ascii文件和二进制文件。
ASCII 文件
<?xml version="1.0"?>
<VTKFile type="UnstructuredGrid" version="0.1" byte_order="LittleEndian" header_type="UInt32" compressor="vtkZLibDataCompressor">
<UnstructuredGrid>
<Piece NumberOfPoints="4" NumberOfCells="1">
<PointData>
</PointData>
<CellData>
</CellData>
<Points>
<DataArray type="Float32" Name="Points" NumberOfComponents="3" format="ascii" RangeMin="0" RangeMax="1.4142135624">
0 0 0 1 0 0
1 1 0 0 1 1
</DataArray>
</Points>
<Cells>
<DataArray type="Int64" Name="connectivity" format="ascii" RangeMin="0" RangeMax="3">
0 1 2 3
</DataArray>
<DataArray type="Int64" Name="offsets" format="ascii" RangeMin="4" RangeMax="4">
4
</DataArray>
<DataArray type="UInt8" Name="types" format="ascii" RangeMin="10" RangeMax="10">
10
</DataArray>
</Cells>
</Piece>
</UnstructuredGrid>
</VTKFile>
二进制文件
<?xml version="1.0"?>
<VTKFile type="UnstructuredGrid" version="0.1" byte_order="LittleEndian" header_type="UInt32" compressor="vtkZLibDataCompressor">
<UnstructuredGrid>
<Piece NumberOfPoints="4" NumberOfCells="1">
<PointData>
</PointData>
<CellData>
</CellData>
<Points>
<DataArray type="Float32" Name="Points" NumberOfComponents="3" format="binary" RangeMin="0" RangeMax="1.4142135624">
AQAAAACAAAAwAAAAEQAAAA==eJxjYEAGDfaobEw+ADwjA7w=
</DataArray>
</Points>
<Cells>
<DataArray type="Int64" Name="connectivity" format="binary" RangeMin="0" RangeMax="3">
AQAAAACAAAAgAAAAEwAAAA==eJxjYIAARijNBKWZoTQAAHAABw==
</DataArray>
<DataArray type="Int64" Name="offsets" format="binary" RangeMin="4" RangeMax="4">
AQAAAACAAAAIAAAACwAAAA==eJxjYYAAAAAoAAU=
</DataArray>
<DataArray type="UInt8" Name="types" format="binary" RangeMin="10" RangeMax="10">
AQAAAACAAAABAAAACQAAAA==eJzjAgAACwAL
</DataArray>
</Cells>
</Piece>
</UnstructuredGrid>
</VTKFile>
当我查看 DataArray 时,以最后一个为例,我无法创建 AQAAAACAAAABAAAACQAAAA==eJzjAgAACwAL
和 10
之间的关系。
我的理解可以用下面的代码来表达,但是得到了CggAAA==
。
#include "base64.h" // https://github.com/superwills/NibbleAndAHalf/blob/master/NibbleAndAHalf/base64.h
#include <iostream>
int main()
{
int x = 10;
int len;
// first arg: binary buffer
// second arg: length of binary buffer
// third arg: length of ascii buffer
char *ascii = base64((char *)&x, sizeof(int), &len);
std::cout << ascii << std::endl;
std::cout << len << std::endl;
free(ascii);
return 0;
}
有人可以给我解释一下如何转换吗? 另一个相关话题见
感谢您的宝贵时间。
解决方案可以在讨论中找到。
https://discourse.vtk.org/t/how-to-understand-binary-dataarray-in-xml-vtk-output/4489
长的额外数据来自压缩头。
我找到了解决方案并将答案写在 a VTK support question 中,但我写在这里以防万一有人来这里寻找与我们两个相同的问题。
请注意,我在 Python 中编程,但我相信 C++ 中有 base64
和 zlib
函数。此外,我使用 numpy
来定义数组,但我相信 std::vector
可以等效地用于 C++。
因此,假设我们要在您的示例中编写名为“Points”的单精度 float32 数组。如果我们假设使用 header 类型的“UInt32”,那么在 Python 中,我们会做:
import numpy as np
import zlib
import base64
# write the float array.
arr = np.array([0, 0, 0, 1, 0, 0,
1, 1, 0, 0, 1, 1], dtype='float32')
# generate a zlib compressed array. This outputs a python byte type
arr_comp = zlib.compress(arr)
# generate the uncompressed header
header = np.array([ 1, # apparently this is always the case, I think
2**15, # from what I have read, this is true in general
arr.nbytes, # the size of the array `arr` in bytes
len(arr_comp)], # the size of the compressed array
dtype='uint32') # because of header_type="UInt32"
# use base64 encoding when writing to file
# `.decode("utf-8")` transforms the python byte type to a string
print((base64.b64encode(header_arr) + base64.b64encode(arr_comp)).decode("utf-8"))
输出符合预期:
AQAAAACAAAAwAAAAEQAAAA==eJxjYEAGDfaobEw+ADwjA7w=
根据zlib python docs,2**15 是控制压缩数据时使用的历史缓冲区大小(或“window 大小”)的参数。虽然不确定这意味着什么...
编辑:以上代码仅在数组的字节大小小于或等于 2**15
时有效。在 VTK 支持问题中,我针对数组较大的情况进行了扩展。你必须把它分成块。