如何解决pytables和h5py中no such node错误

Question

我使用 pytables 构建了一个 hdf5 数据集。它包含数千个节点，每个节点都是未经压缩存储的图像（形状为 512x512x3）。当我运行一个深度学习训练循环（带有 Pytorch 数据加载器）时，它随机崩溃，说该节点不存在。但是，它永远不会丢失同一个节点，当我自己打开文件以验证节点是否在这里时，它总是在这里。

我按顺序运行处理所有内容，因为我认为我可能是 multithreading/multiprocessing 访问文件的错误。但它并没有解决问题。我尝试了很多东西，但都没有用。

有人知道该怎么做吗？我是否应该在调用之间添加一个定时器，让机器有时间重新分配文件？

最初我只使用 pytables，但为了解决我的问题，我尝试使用 h5py 加载文件。不幸的是，它并没有更好地工作。

这是我在 h5py 中遇到的错误：“RuntimeError: Unable to get link info (bad symbol table node signature)”

确切的错误可能会改变，但每次它都说“错误的符号 table 节点签名”

PS：我无法共享代码，因为它很大并且是我公司属性的更大基码的一部分。我仍然可以分享下面的部分代码来展示我是如何加载图像的：

with h5py.File(dset_filepath, "r", libver='latest', swmr=True) as h5file:
    node = h5file["/train_group_0/sample_5"] # <- this line breaks
    target = node.attrs.get('TITLE').decode('utf-8')
    img = Image.fromarray(np.uint8(node))
    return img, int(target.strip())

Answer 1

在访问数据集（节点）之前，添加一个测试来确认它存在。在添加检查时，对属性 'TITLE' 执行相同的操作。如果您打算使用硬编码路径名（如 'group_0'），您应该检查路径中的所有节点是否存在（例如，'group_0' 是否存在？或使用递归访问者函数之一（.visit() 或 .visititems() 以确保您只访问现有节点。

经过基本检查的修改后的 h5py 代码如下所示：

sample = 'sample_5' 
with h5py.File(dset_filepath, 'r', libver='latest', swmr=True) as h5file:
    if sample not in h5file['/train_group_0'].keys():
        print(f'Dataset Read Error: {sample} not found')
        return None, None
    else:
        node = h5file[f'/train_group_0/{sample}'] # <- this line breaks
        img = Image.fromarray(np.uint8(node))
        if 'TITLE' not in node.attrs.keys():
            print(f'Attribute Read Error: TITLE not found')
            return img, None
        else:
            target = node.attrs.get('TITLE').decode('utf-8')
            return img, int(target.strip())

你说你在使用 PyTables。这是对 PyTables 包执行相同操作的代码：

import tables as tb
sample = 'sample_5'
with tb.File(dset_filepath, 'r', libver='latest', swmr=True) as h5file:
    if sample not in h5file.get_node('/train_group_0'):
        print(f'Dataset Read Error: {sample} not found')
        return None, None
    else:
        node = h5file.get_node(f'/train_group_0/{sample}') # <- this line breaks
        img = Image.fromarray(np.uint8(node))
        if 'TITLE' not in node._v_attrs:
            print(f'Attribute Read Error: TITLE not found')
            return img, None
        else:
            target = node._v_attrs['TITLE'].decode('utf-8')
            return img, int(target.strip())

如何解决pytables和h5py中no such node错误

How to solve no such node error in pytables and h5py

pytables

python-3.x

h5py

pytorch