在 Python 中打开 .h5 文件

Open .h5 file in Python

我正在尝试读取 Python 中的 h5 文件。

可以在 this link 中找到该文件,文件名为 'vstoxx_data_31032014.h5'。我尝试 运行 的代码来自 Yves Hilpisch 所著的 Python for Finance 一书,如下所示:

import pandas as pd     
h5 = pd.HDFStore('path.../vstoxx_data_31032014.h5', 'r')
futures_data = h5['futures_data']  # VSTOXX futures data
options_data = h5['options_data']  # VSTOXX call option data
h5.close()

我收到以下错误:

h5 = pd.HDFStore('path.../vstoxx_data_31032014.h5', 'r')
Traceback (most recent call last):

  File "<ipython-input-692-dc4e79ec8f8b>", line 1, in <module>
    h5 = pd.HDFStore('path.../vstoxx_data_31032014.h5', 'r')

  File "C:\Users\Laura\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 466, in __init__
    self.open(mode=mode, **kwargs)

  File "C:\Users\Laura\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 637, in open
    raise IOError(str(e))

OSError: HDF5 error back trace

  File "C:\aroot\work\hdf5-1.8.15-patch1\src\H5F.c", line 604, in H5Fopen
    unable to open file
  File "C:\aroot\work\hdf5-1.8.15-patch1\src\H5Fint.c", line 1085, in H5F_open
    unable to read superblock
  File "C:\aroot\work\hdf5-1.8.15-patch1\src\H5Fsuper.c", line 277, in H5F_super_read
    file signature not found

End of HDF5 error back trace

Unable to open/create file 'path.../vstoxx_data_31032014.h5'

出于这个问题的目的,我将我的工作目录替换为 'path.../'。

有谁知道这个错误可能来自哪里?

要使用 h5py 模块打开 HDF5 文件,您可以使用 h5py.File(filename)。可以找到文档 here

import h5py

filename = "vstoxx_data_31032014.h5"

h5 = h5py.File(filename,'r')

futures_data = h5['futures_data']  # VSTOXX futures data
options_data = h5['options_data']  # VSTOXX call option data

h5.close()
import os

wd=os.chdir('pah of your working directory') #change the file path to your working directory
wd=os.getcwd() #request what is the current working directory
print(wd)

if __name__ == '__main__':
    # import required libraries
    import h5py as h5
    import numpy as np
    import matplotlib.pyplot as plt

    f = h5.File("hdf5 file with its path", "r")
    datasetNames = [n for n in f.keys()]
    for n in datasetNames:
        print(n)