将 PyTables/HDF5 文件中的所有数组从 float64 转换为 float32
Converting all arrays in a PyTables/HDF5 file from float64 to float32
我有一个包含大量子目录的 PyTables 文件。我有一种方法可以遍历 table 中的所有数组数据类型。它们是 float64;我想将文件 就地 转换,同时将所有数据点从 float64 转换为 float32.
According to this question,覆盖数组的一种方法是赋值。我有以下代码片段,它试图在 table 中获取此 "count" value/array,将其转换为 float32,并将其分配回 table:
import h5py
import numpy as np
# filehead is a string for a file
with h5py.File(filehead, 'r+') as f:
# Lots of stuff here ... e.g. `head` is a string
print("/obsnorm/Standardizer/count {}".format(f[head+'/obsnorm/Standardizer/count']))
print("count value: {}".format(f[head+'/obsnorm/Standardizer/count'].value))
f[head+'/obsnorm/Standardizer/count'][...] = (f[head+'/obsnorm/Standardizer/count'].value).astype('float32')
print("/obsnorm/Standardizer/count {}".format(f[head+'/obsnorm/Standardizer/count']))
print("count value: {}".format(f[head+'/obsnorm/Standardizer/count'].value))
很遗憾,打印的结果是:
/obsnorm/Standardizer/count <HDF5 dataset "count": shape (), type "<f8">
count value: 512364.0
/obsnorm/Standardizer/count <HDF5 dataset "count": shape (), type "<f8">
count value: 512364.0
也就是说,在赋值之前,count的类型是f8,即float64。转换后,类型为still float64.
如何就地修改此数据以便真正将数据理解为 float32?
正如 hpaulj 在评论中所建议的,我决定简单地重新创建一个重复的 HDF5 文件,除了制作 f4
类型的数据集(与 float32 相同)并且我能够实现我的编码目标。
伪代码如下:
import h5py
import numpy as np
# Open the original file jointly with new file, with `float32` at the end.
with h5py.File(oldfile, 'r') as f, h5py.File(newfile[:-3]+'_float32.h5', 'w') as newf:
# `head` is some directory structure
# Create groups to follow the same directory structure
newf.create_group(head)
# When it comes time to create a dataset, make the cast here.
newdata = (f[head+'/name_here'].value).astype('float32')
newf.create_dataset(head+'/name_here', data=newdata, dtype='f4')
# Proceed for all other datasets.
我有一个包含大量子目录的 PyTables 文件。我有一种方法可以遍历 table 中的所有数组数据类型。它们是 float64;我想将文件 就地 转换,同时将所有数据点从 float64 转换为 float32.
According to this question,覆盖数组的一种方法是赋值。我有以下代码片段,它试图在 table 中获取此 "count" value/array,将其转换为 float32,并将其分配回 table:
import h5py
import numpy as np
# filehead is a string for a file
with h5py.File(filehead, 'r+') as f:
# Lots of stuff here ... e.g. `head` is a string
print("/obsnorm/Standardizer/count {}".format(f[head+'/obsnorm/Standardizer/count']))
print("count value: {}".format(f[head+'/obsnorm/Standardizer/count'].value))
f[head+'/obsnorm/Standardizer/count'][...] = (f[head+'/obsnorm/Standardizer/count'].value).astype('float32')
print("/obsnorm/Standardizer/count {}".format(f[head+'/obsnorm/Standardizer/count']))
print("count value: {}".format(f[head+'/obsnorm/Standardizer/count'].value))
很遗憾,打印的结果是:
/obsnorm/Standardizer/count <HDF5 dataset "count": shape (), type "<f8">
count value: 512364.0
/obsnorm/Standardizer/count <HDF5 dataset "count": shape (), type "<f8">
count value: 512364.0
也就是说,在赋值之前,count的类型是f8,即float64。转换后,类型为still float64.
如何就地修改此数据以便真正将数据理解为 float32?
正如 hpaulj 在评论中所建议的,我决定简单地重新创建一个重复的 HDF5 文件,除了制作 f4
类型的数据集(与 float32 相同)并且我能够实现我的编码目标。
伪代码如下:
import h5py
import numpy as np
# Open the original file jointly with new file, with `float32` at the end.
with h5py.File(oldfile, 'r') as f, h5py.File(newfile[:-3]+'_float32.h5', 'w') as newf:
# `head` is some directory structure
# Create groups to follow the same directory structure
newf.create_group(head)
# When it comes time to create a dataset, make the cast here.
newdata = (f[head+'/name_here'].value).astype('float32')
newf.create_dataset(head+'/name_here', data=newdata, dtype='f4')
# Proceed for all other datasets.