如何使用 HDF5 存储和加载 Python 字典

Question

我在从 HDF5 文件加载字典（字符串键和 array/list 值）时遇到问题（我认为存储工作正常 - 正在创建文件并包含数据）。我收到以下错误：

ValueError: malformed node or string: < HDF5 dataset "dataset_1": shape (), type "|O" >

我的代码是：

import h5py

def store_table(self, filename):
    table = dict()
    table['test'] = list(np.zeros(7,dtype=int))

    with h5py.File(filename, "w") as file:
        file.create_dataset('dataset_1', data=str(table))
        file.close()


def load_table(self, filename):
    file = h5py.File(filename, "r")
    data = file.get('dataset_1')
    print(ast.literal_eval(data))

我使用 ast 方法在线阅读 literal_eval 应该有效，但似乎没有帮助。 .. 我如何 'unpack' HDF5 使其再次成为字典？

如有任何想法，我们将不胜感激。

Answer 1

如果我明白你想做什么，这应该行得通：

import numpy as np
import ast
import h5py


def store_table(filename):
    table = dict()
    table['test'] = list(np.zeros(7,dtype=int))

    with h5py.File(filename, "w") as file:
        file.create_dataset('dataset_1', data=str(table))


def load_table(filename):
    file = h5py.File(filename, "r")
    data = file.get('dataset_1')[...].tolist()
    file.close();
    return ast.literal_eval(data)

filename = "file.h5"
store_table(filename)
data = load_table(filename)
print(data)

Answer 2

我不清楚你真正想要完成什么。（我怀疑你的字典有超过七个零。否则，HDF5 存储你的数据就太过分了。）如果你有很多非常大的字典，最好将数据转换为 NumPy 数组，然后 1) 创建和使用 data= 加载数据集或 2) 使用适当的 dtype 创建数据集然后填充。您可以创建具有混合数据类型的数据集，这在以前的解决方案中没有解决。如果这些情况不适用，您可能希望将字典保存为属性。属性可以关联到组、数据集或文件对象本身。哪个最好取决于您的要求。

我写了一个简短的例子来展示如何加载字典 key/value 对作为属性 names/value 对标记到一个组。对于这个例子，我假设字典有一个 name 键和用于关联的组名。对于数据集或文件对象，该过程几乎相同（只需更改对象引用）。

import h5py

def load_dict_to_attr(h5f, thisdict) :

   if 'name' not in thisdict:
       print('Dictionary missing name key. Skipping function.')
       return

   dname = thisdict.get('name') 
   if dname in h5f:
       print('Group:' + dname + ' exists. Skipping function.')
       return
   else: 
       grp = h5f.create_group(dname)

       for key, val in thisdict.items():
           grp.attrs[key] = val

###########################################

def get_grp_attrs(name, node) :

    grp_dict = {}
    for k in node.attrs.keys():
        grp_dict[k]= node.attrs[k]

    print (grp_dict)

###########################################

car1 = dict( name='my_car', brand='Ford', model='Mustang', year=1964,
             engine='V6',  disp=260,  units='cu.in' )
car2 = dict( name='your_car', brand='Chevy', model='Camaro', year=1969,
             engine='I6',  disp=250,  units='cu.in' )
car3 = dict( name='dads_car', brand='Mercedes', model='350SL', year=1972,
             engine='V8',  disp=4520, units='cc' )
car4 = dict( name='moms_car', brand='Plymouth', model='Voyager', year=1989,
             engine='V6',  disp=289,  units='cu.in' )

a_truck = dict(             brand='Dodge', model='RAM', year=1984,
               engine='V8', disp=359, units='cu.in' )

garage = dict(my_car=car1, 
              your_car=car2,
              dads_car=car3,
              moms_car=car4,
              a_truck=a_truck )

with h5py.File('SO_61226773.h5','w') as h5w:

    for car in garage:
        print ('\nLoading dictionary:', car)
        load_dict_to_attr(h5w, garage.get(car))

with h5py.File('SO_61226773.h5','r') as h5r:

    print ('\nReading dictionaries from Group attributes:')
    h5r.visititems (get_grp_attrs)

如何使用 HDF5 存储和加载 Python 字典

How to store and load a Python dictionary with HDF5

python

dictionary

hdf5

h5py