HDF5:我需要明确设置字节顺序吗?

HDF5: Do I need to to explicitly set the byte order?

在 HDF5 组手册中的一些 HDF5 示例中,字节顺序明确设置为 'little endian'。在其他一些例子中没有给出明确的定义。我现在的问题是,我需要关心字节顺序吗?我可以不指定它并依赖默认值吗?

摘自 example in which the byte order is explicitly specified 的片段:

DataSpace dataspace( RANK, dimsf );

IntType datatype( PredType::NATIVE_INT );
datatype.setOrder( H5T_ORDER_LE );

DataSet dataset = file.createDataSet( DATASET_NAME, datatype, dataspace );

如果我只使用以下内容呢?

DataSpace dataspace( RANK, dimsf );

DataSet dataset = file.createDataSet( DATASET_NAME, PredType::NATIVE_INT, dataspace );

(我已经验证编译和 运行,如果我用 HDFView 和 h5py 读取相同的数据)

这不是一个明确的答案,但我发现它值得分享(而且评论太多了):

来自 HDF5 用户指南 Chapter 6 HDF5 Datatypes:

2.3 Data transfer (Read and Write)

Probably the most common use of datatypes is to write or read data from a dataset or attribute. In these operations, each data element is transferred from the source to the destination (possibly rearranging the order of the elements). Since the source and destination do not need to be identical (i.e., one is disk and the other is memory) the transfer requires both the format of the source element and the destination element. Therefore, data transfers use two datatype objects, for the source and destination.

When data is written, the source is memory and the destination is disk (file). The memory datatype describes the format of the data element in the machine memory, and the file datatype describes the desired format of the data element on disk. Similarly, when reading, the source datatype describes the format of the data element on disk, and the destination datatype describes the format in memory.

In the most common cases, the file datatype is the datatype specified when the dataset was created, and the memory datatype should be the appropriate NATIVE type.

这与之前评论中所说的并不矛盾...

当写入和读取数据时,HDF5 库考虑两种数据类型,一种在内存中,一种在磁盘上。

例如,考虑 H5Dread 的文档:

The memory datatype of the (partial) dataset is identified by the identifier mem_type_id. (...) Datatype conversion takes place at the time of a read or write and is automatic.

数据类型 "on disk" 将从数据集的元数据中推断出来。这在用户指南中也有说明,请参阅 The Data Transfer Pipeline (where the relevant part is "transform" in the diagram) and Data Transfer: Datatype Conversion and Selection 了解详细信息。

所以,在读取数据的时候,你不需要关心什么是"on disk",HDF5会处理(包括字节顺序)

如果您在调用 H5Dwrite 时即时将 64 位浮点数存储到 32 位浮点数数据集中,则会发生另一个转换。

写入数据时,可以选择HDF5提供的原生类型,如果对存储类型有限制,可以选择其他类型。我从 C、Fortran 和 Python 开始就使用 HDF5,从来不用担心这些。 (好吧,经过长时间的掌握HDF5中的几个概念)。