如何读取 Python 中的 HDF5 文件

Question

我正在尝试从 Python 中的 hdf5 文件中读取数据。我可以使用 h5py 读取 hdf5 文件，但我不知道如何访问文件中的数据。

我的代码

import h5py    
import numpy as np    
f1 = h5py.File(file_name,'r+')

这有效并且文件被读取。但是如何访问文件对象中的数据 f1?

Answer 1

你可以使用 Pandas.

import pandas as pd
pd.read_hdf(filename,key)

Answer 2

您需要做的是创建一个数据集。如果您查看快速入门指南，它会告诉您需要使用文件对象来创建数据集。所以，f.create_dataset 然后你就可以读取数据了。 docs.

中对此进行了解释

Answer 3

阅读 HDF5

import h5py
filename = "file.hdf5"

with h5py.File(filename, "r") as f:
    # List all groups
    print("Keys: %s" % f.keys())
    a_group_key = list(f.keys())[0]

    # Get the data
    data = list(f[a_group_key])

写HDF5

import h5py

# Create random data
import numpy as np
data_matrix = np.random.uniform(-1, 1, size=(10, 3))

# Write data to HDF5
with h5py.File("file.hdf5", "w") as data_file:
    data_file.create_dataset("group_name", data=data_matrix)

有关详细信息，请参阅 h5py docs。

备选方案

JSON：非常适合编写人类可读的数据；非常常用 (read & write)
CSV：超级简单的格式()
pickle：一种Python序列化格式（read & write）
MessagePack (Python package): More compact representation ()
HDF5 (Python package): Nice for matrices ()
XML: 也存在*叹气* (read & write)

对于您的申请，以下内容可能很重要：

其他编程语言的支持
读写性能
紧凑性（文件大小）

另请参阅：Comparison of data serialization formats

如果您更想寻找制作配置文件的方法，您可能需要阅读我的短文 Configuration files in Python

Answer 4

正在读取文件

import h5py

f = h5py.File(file_name, mode)

通过打印存在的 HDF5 组来研究文件的结构

for key in f.keys():
    print(key) #Names of the groups in HDF5 file.

提取数据

#Get the HDF5 group
group = f[key]

#Checkout what keys are inside that group.
for key in group.keys():
    print(key)

data = group[some_key_inside_the_group][()]
#Do whatever you want with data

#After you are done
f.close()

Answer 5

使用下面的代码读取数据并转换成numpy数组

import h5py
f1 = h5py.File('data_1.h5', 'r')
list(f1.keys())
X1 = f1['x']
y1=f1['y']
df1= np.array(X1.value)
dfy1= np.array(y1.value)
print (df1.shape)
print (dfy1.shape)

Answer 6

要读取.hdf5文件的内容作为一个数组，你可以做如下操作

> import numpy as np 
> myarray = np.fromfile('file.hdf5', dtype=float)
> print(myarray)

Answer 7

这是我刚刚编写的一个简单函数，它读取由 keras 中的 save_weights 函数生成的 .hdf5 文件和 returns 带有层名称和权重的字典：

def read_hdf5(path):

    weights = {}

    keys = []
    with h5py.File(path, 'r') as f: # open file
        f.visit(keys.append) # append all keys to list
        for key in keys:
            if ':' in key: # contains data if ':' in key
                print(f[key].name)
                weights[f[key].name] = f[key].value
    return weights

https://gist.github.com/Attila94/fb917e03b04035f3737cc8860d9e9f9b.

尚未对其进行彻底测试，但对我来说已经足够了。

Answer 8

from keras.models import load_model 

h= load_model('FILE_NAME.h5')

Answer 9

使用这个问题和最新 doc 的一些答案，我能够使用

提取我的数值数组

import h5py
with h5py.File(filename, 'r') as h5f:
    h5x = h5f[list(h5f.keys())[0]]['x'][()]

其中 'x' 只是我的 X 坐标。

Answer 10

如果您在 hdf 文件中命名了数据集，那么您可以使用以下代码将这些数据集读取并转换为 numpy 数组：

import h5py
file = h5py.File('filename.h5', 'r')

xdata = file.get('xdata')
xdata= np.array(xdata)

如果你的文件在不同的目录下你可以在'filename.h5'.

前面加上路径

如何读取 Python 中的 HDF5 文件

How to read HDF5 files in Python

python

hdf5

我的代码

阅读 HDF5

写HDF5

备选方案