pathlib.Path 的子类对象在 pickle.load 之后丢失了自定义属性

Subclass object of pathlib.Path gets custom attributes lost after pickle.load

from pathlib import Path
import pickle
class P(type(Path())):
    def __init__(self, *args):
        super().__init__()
        self.a = ''
p = P()
p.a = 'x'
with open('xx', 'wb') as wf:
    pickle.dump(p, wf)
p1 = pickle.load(open('xx', 'rb'))
print(p1.a)            # here p1.a is ''

我正在创建 pathlib.Path 的子类,并希望向其添加一些自定义属性。 问题是自定义属性在 pickle 重新加载后丢失了。 如何解决这个问题。

我尝试过的其他解决方案:

class File():
    def __init__(self, *args):
        self.path = Path(*args)
    def __getattr__(self, item):
        return getattr(self.path, item)
p = File('aaa')
p.exists()  # no error
with open('xx', 'wb') as wf:
    pickle.dump(p, wf)
p1 = pickle.load(open('xx', 'rb')) 
# RecursionError: maximum recursion depth exceeded. 
# This is due to call of self.path, in that moment, path is not in self.__dict__

一种方法是使用 copyreg 模块将 pickle 支持函数关联到您的 class 实例,如下所示。请注意,我还必须修改您的 P class 处理参数的方式——它不再忽略它们。

import copyreg
from pathlib import Path
import pickle


class P(type(Path())):
    def __init__(self, *args):
        super().__init__()
        self.a = args[0] if args else ''


def pickle_P(p):
    print("pickling a P instance...")
    return P, (p.a,)

copyreg.pickle(P, pickle_P)

p = P()
p.a = 'x'
q = P('y')

with open('xx', 'wb') as outp:
    pickle.dump(p, outp)
    pickle.dump(q, outp)

with open('xx', 'rb') as inp:
    p1 = pickle.load(inp)
    q1 = pickle.load(inp)

print('p1.a = {!r}'.format(p1.a))
print('q1.a = {!r}'.format(q1.a))

输出:

pickling a P instance...
pickling a P instance...
p1.a = 'x'
q1.a = 'y'

关于继承问题

作为@martineau 回答注释的另一种解决方案。

如果我是对的,问题是由pathlib.PosixPath中的__reduce__方法引起的。泡菜行为似乎将由这种方法决定。 @martineau 使用 copyreg.pickle(P, pickle_P) 的解决方案也与此方法有关:pickle_P__reduce__.

具有相同的 return 模式

这是关于 __reduce__ 的 return 值的文档:

When a tuple is returned, it must be between two and six items long. Optional items can either be omitted, or None can be provided as their value. The semantics of each item are in order:

  • A callable object that will be called to create the initial version of the object.

  • A tuple of arguments for the callable object. An empty tuple must be given if the callable does not accept any argument.

  • Optionally, the object’s state, which will be passed to the object’s __setstate__() method as previously described. If the object has no such method then, the value must be a dictionary and it will be added to the object’s __dict__ attribute.

  • ...

第二项解释了@martineau 的解决方案是如何工作的:第二个 return 值将被传递到 __init__

这是PosixPath.__reduce__

的源代码
    def __reduce__(self):
        # Using the parts tuple helps share interned path parts
        # when pickling related paths.
        # self._parts is arguments passed to Path
        return (self.__class__, tuple(self._parts))

根据第三个return值的描述,解法为:

from pathlib import Path
import pickle
class P(type(Path())):
    def __init__(self, *args):
        super().__init__()
        self.a = ''
    def __reduce__(self):
        return self.__class__, tuple(self._parts), self.__dict__

p = P()
p.a = 'x'
with open('xx', 'wb') as wf:
    pickle.dump(p, wf)
p1 = pickle.load(open('xx', 'rb'))
print(p1.a)            # here p1.a is 'x'

此解决方案的缺点:

  • P 的实例将包含一个 __dict__ 属性(Path 使用 __slots__)。
  • 名为 _hash 的属性将被忽略。

作文题

pickle 文档中的注释可能会解释错误原因。

Note At unpickling time, some methods like __getattr__(), __getattribute__(), or __setattr__() may be called upon the instance. In case those methods rely on some internal invariant being true, the type should implement __new__() to establish such an invariant, as __init__() is not called when unpickling an instance.

为了确保在调用__getattr__时存在path属性,一个解决方案是将属性赋值移动到__new__方法中(在__init__之前)。

class File():
    def __new__(cls, *args):
        obj = super().__new__(cls)
        obj.path = Path(*args)
        return obj
    def __getattr__(self, item):
        return getattr(self.path, item)
p = File('aaa')
p.exists()  # no error
with open('xx', 'wb') as wf:
    pickle.dump(p, wf)
p1 = pickle.load(open('xx', 'rb')) # no error