将 PyTables/HDF5 文件中的所有数组从 float64 转换为 float32

Question

我有一个包含大量子目录的 PyTables 文件。我有一种方法可以遍历 table 中的所有数组数据类型。它们是 float64；我想将文件就地转换，同时将所有数据点从 float64 转换为 float32.

According to this question，覆盖数组的一种方法是赋值。我有以下代码片段，它试图在 table 中获取此 "count" value/array，将其转换为 float32，并将其分配回 table:

import h5py
import numpy as np

# filehead is a string for a file
with h5py.File(filehead, 'r+') as f:
    # Lots of stuff here ... e.g. `head` is a string

    print("/obsnorm/Standardizer/count {}".format(f[head+'/obsnorm/Standardizer/count']))
    print("count value: {}".format(f[head+'/obsnorm/Standardizer/count'].value))
    f[head+'/obsnorm/Standardizer/count'][...] = (f[head+'/obsnorm/Standardizer/count'].value).astype('float32')
    print("/obsnorm/Standardizer/count {}".format(f[head+'/obsnorm/Standardizer/count']))
    print("count value: {}".format(f[head+'/obsnorm/Standardizer/count'].value))

很遗憾，打印的结果是：

/obsnorm/Standardizer/count <HDF5 dataset "count": shape (), type "<f8">
count value: 512364.0
/obsnorm/Standardizer/count <HDF5 dataset "count": shape (), type "<f8">
count value: 512364.0

也就是说，在赋值之前，count的类型是f8，即float64。转换后，类型为still float64.

如何就地修改此数据以便真正将数据理解为 float32？

Answer 1

正如 hpaulj 在评论中所建议的，我决定简单地重新创建一个重复的 HDF5 文件，除了制作 f4 类型的数据集（与 float32 相同）并且我能够实现我的编码目标。

伪代码如下：

import h5py
import numpy as np

# Open the original file jointly with new file, with `float32` at the end.
with h5py.File(oldfile, 'r') as f, h5py.File(newfile[:-3]+'_float32.h5', 'w') as newf:
    # `head` is some directory structure
    # Create groups to follow the same directory structure
    newf.create_group(head)

    # When it comes time to create a dataset, make the cast here.
    newdata = (f[head+'/name_here'].value).astype('float32')
    newf.create_dataset(head+'/name_here', data=newdata, dtype='f4')

    # Proceed for all other datasets.

将 PyTables/HDF5 文件中的所有数组从 float64 转换为 float32

Converting all arrays in a PyTables/HDF5 file from float64 to float32

arrays

numpy

hdf5

pytables

h5py