PyTorch - RuntimeError: [enforce fail at inline_container.cc:209] . file not found: archive/data.pkl

PyTorch - RuntimeError: [enforce fail at inline_container.cc:209] . file not found: archive/data.pkl

问题

我正在尝试使用 PyTorch 加载文件,但错误状态 archive/data.pkl 不存在。

代码

import torch
cachefile = 'cacheddata.pth'
torch.load(cachefile)

输出

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-4-8edf1f27a4bd> in <module>
      1 import torch
      2 cachefile = 'cacheddata.pth'
----> 3 torch.load(cachefile)

~/opt/anaconda3/envs/matching/lib/python3.8/site-packages/torch/serialization.py in load(f, map_location, pickle_module, **pickle_load_args)
    582                     opened_file.seek(orig_position)
    583                     return torch.jit.load(opened_file)
--> 584                 return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
    585         return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
    586 

~/opt/anaconda3/envs/matching/lib/python3.8/site-packages/torch/serialization.py in _load(zip_file, map_location, pickle_module, **pickle_load_args)
    837 
    838     # Load the data (which may in turn use `persistent_load` to load tensors)
--> 839     data_file = io.BytesIO(zip_file.get_record('data.pkl'))
    840     unpickler = pickle_module.Unpickler(data_file, **pickle_load_args)
    841     unpickler.persistent_load = persistent_load

RuntimeError: [enforce fail at inline_container.cc:209] . file not found: archive/data.pkl

假设

我猜这与 pickle 有关,来自 docs:

This save/load process uses the most intuitive syntax and involves the least amount of code. Saving a model in this way will save the entire module using Python’s pickle module. The disadvantage of this approach is that the serialized data is bound to the specific classes and the exact directory structure used when the model is saved. The reason for this is because pickle does not save the model class itself. Rather, it saves a path to the file containing the class, which is used during load time. Because of this, your code can break in various ways when used in other projects or after refactors.

版本

原来文件不知何故损坏了。再次生成后,它加载没有问题。

我遇到了同样的问题。我直接从 GCP AI 平台上的笔记本下载了使用 GPU 训练的模型 (.pt)。当我通过 torch.load('models/model.pt', map_location=device) 将它加载到本地时,我得到了这个错误:

RuntimeError: [enforce fail at inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory`.

我注意到下载的文件比预期的小很多。与@Ian 一样,从笔记本下载时文件已损坏。最后我不得不首先将文件从笔记本传输到 Google Cloud Storage (GCS) 上的存储桶中,而不是直接下载它,然后从 GCS 下载文件。现在可以使用了。

我遇到这个问题不是针对单个文件,而是针对我正在处理的任何文件。 查看文件大小,您可以说它们已损坏,因为它们太小且不完整,但为什么它们总是以这种方式创建?

我认为问题在于我对保存的简单 class 进行了无害的修改。所以就像我做了一个 class Foo,保持数据不变但添加了一些方法,然后当我只有 [=12= 的更新 class 定义时试图保存一个旧实例].

这是我认为发生的事情的一个例子,但它并没有完全重现:

class Foo(object):
  def __init__(self):
    self.contents = [1,2,3]
    
torch.save(Foo(), "foo1.pth")

foo1 = torch.load("foo1.pth") # saved with class version 1 of Foo

# some days later the code looks like this
class Foo(object):
  def __init__(self):
    self.contents = [1,2,3]
  def __len__(self):
    return len(self.contents)

foo1 = torch.load("foo1.pth") # still works
torch.save(foo1, "foo2.pth") # try to save version 1 object where class is no longer known

我第一次遇到类似 PicklingError: Can't pickle <class '__main__.Foo'>: it's not the same object as __main__.Foo 的错误,但是当使用 Jupyter Notebook 的自动重新加载功能时,很难说出到底发生了什么。 通常较旧的 classes 可以毫无问题地加载到较新的 class 定义中。

不管实际发生了什么,我的解决方案是加载旧版本并手动将数据字段复制到 Foo 的新实例化版本中,例如:

old = torch.load("foo1.pth")
new = Foo()
# new = old # this was the code that caused issues
new.contents = old.contents
torch.save(new, "foo2.pth")

就我而言,我的磁盘驱动器已满。

就我而言,此错误的主要原因是 .pt 文件 已损坏 。我在文件还在创建的时候就开始下载了。

所以,为了避免错误,将.pt文件复制到另一个目录,然后从该目录下载.pt文件。