将 PyTables/HDF5 文件中的所有数组从 float64 转换为 float32

Converting all arrays in a PyTables/HDF5 file from float64 to float32

我有一个包含大量子目录的 PyTables 文件。我有一种方法可以遍历 table 中的所有数组数据类型。它们是 float64;我想将文件 就地 转换,同时将所有数据点从 float64 转换为 float32.

According to this question,覆盖数组的一种方法是赋值。我有以下代码片段,它试图在 table 中获取此 "count" value/array,将其转换为 float32,并将其分配回 table:

import h5py
import numpy as np

# filehead is a string for a file
with h5py.File(filehead, 'r+') as f:
    # Lots of stuff here ... e.g. `head` is a string

    print("/obsnorm/Standardizer/count {}".format(f[head+'/obsnorm/Standardizer/count']))
    print("count value: {}".format(f[head+'/obsnorm/Standardizer/count'].value))
    f[head+'/obsnorm/Standardizer/count'][...] = (f[head+'/obsnorm/Standardizer/count'].value).astype('float32')
    print("/obsnorm/Standardizer/count {}".format(f[head+'/obsnorm/Standardizer/count']))
    print("count value: {}".format(f[head+'/obsnorm/Standardizer/count'].value))

很遗憾,打印的结果是:

/obsnorm/Standardizer/count <HDF5 dataset "count": shape (), type "<f8">
count value: 512364.0
/obsnorm/Standardizer/count <HDF5 dataset "count": shape (), type "<f8">
count value: 512364.0

也就是说,在赋值之前,count的类型是f8,即float64。转换后,类型为still float64.

如何就地修改此数据以便真正将数据理解为 float32?

正如 hpaulj 在评论中所建议的,我决定简单地重新创建一个重复的 HDF5 文件,除了制作 f4 类型的数据集(与 float32 相同)并且我能够实现我的编码目标。

伪代码如下:

import h5py
import numpy as np

# Open the original file jointly with new file, with `float32` at the end.
with h5py.File(oldfile, 'r') as f, h5py.File(newfile[:-3]+'_float32.h5', 'w') as newf:
    # `head` is some directory structure
    # Create groups to follow the same directory structure
    newf.create_group(head)

    # When it comes time to create a dataset, make the cast here.
    newdata = (f[head+'/name_here'].value).astype('float32')
    newf.create_dataset(head+'/name_here', data=newdata, dtype='f4')

    # Proceed for all other datasets.