在 python 中正确设置和读取 hdf5 文件中的 dimscale

Setting and reading dimscale in hdf5 files correctly in python

我正在尝试将尺寸比例附加到我想使用 python 存储在 hdf5 文件中的数据集,但是当我尝试在设置属性后打印它们时出现错误。相关代码片段如下:

import h5py
import numpy as np

# create data and x-axis
my_data = np.random.randint(10, size=(100, 200))
x_axis  = np.linspace(0, 1, 100)

h5f = h5.File('my_file.h5','w')
h5f.create_dataset( 'data_1', data=my_data )
h5f['data_1'].dims[0].label = 'm'
h5f['data_1'].dims.create_scale( h5f['x_axis'], 'x' )

# the following line is creating the problems
h5f['data_1'].dims[0].attach_scale( h5f['x_axis'] )

# this is where the crash happens but only if the above line is included
for ii in h5f['data_1'].attrs.items():
    print ii

h5f.close()

命令 print(h5.version.info) 打印以下输出:

Summary of the h5py configuration
---------------------------------

h5py    2.2.1
HDF5    1.8.11
Python  2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2]
sys.platform    linux2
sys.maxsize     9223372036854775807
numpy   1.8.2

错误信息如下:

Traceback (most recent call last):
  File "HDF_write_dimScales.py", line 16
    for ii in h5f['data_1'].attrs.items():
  File "/usr/lib/python2.7/dist-packages/h5py/_hl/base.py", line 347, in items
    return [(x, self.get(x)) for x in self]
  File "/usr/lib/python2.7/dist-packages/h5py/_hl/base.py", line 310, in get
    return self[name]
  File "/usr/lib/python2.7/dist-packages/h5py/_hl/attrs.py", line 55, in __getitem__
    rtdt = readtime_dtype(attr.dtype, [])
  File "h5a.pyx", line 318, in h5py.h5a.AttrID.dtype.__get__ (h5py/h5a.c:4285)
  File "h5t.pyx", line 337, in h5py.h5t.TypeID.py_dtype (h5py/h5t.c:3892)
TypeError: No NumPy equivalent for TypeVlenID exists

如有任何想法或提示,我们将不胜感激。

这只是一个猜测,但由于错误引用 TypeVlenID,它可能与 h5pyvlen 的不完整实现有关(尤其是在我们的版本中模块)。

Inexplicable behavior when using vlen with h5py

Writing to compound dataset with variable length string via h5py (HDF5)

它在 h5py 2.5.0 上对我进行了一些细微的调整。问题可能与您调用 create_scale 时有关。使用 h5py 2.5.0,我在您的 create_scale() 电话中收到 h5f['x_axis']KeyError。为了使您的示例正常工作,我必须先明确创建 x_axis 数据集。

import h5py
import numpy as np

# create data and x-axis
my_data = np.random.randint(10, size=(100, 200))

# Use a context manager to ensure h5f is closed
with h5py.File('my_file.h5','w') as h5f:
    h5f.create_dataset( 'data_1', data=my_data )

    # Create the x_axis dataset directly in the HDF5 file
    h5f['x_axis']  = np.linspace(0, 1, 100)

    h5f['data_1'].dims[0].label = 'm'

    # Now we can create and attach the scale without problems
    h5f['data_1'].dims.create_scale( h5f['x_axis'], 'x' )
    h5f['data_1'].dims[0].attach_scale( h5f['x_axis'] )

    for ii in h5f['data_1'].attrs.items():
        print(ii)

# Output
#(u'DIMENSION_LABELS', array(['m', ''], dtype=object))
#(u'DIMENSION_LIST', array([array([<HDF5 object reference>], dtype=object),
#       array([], dtype=object)], dtype=object))

如果您仍然遇到问题,您可能需要升级到 h5py 2.5.0,它可以更好地处理(尽管仍然不完美)VLEN 类型。