pickling-unpickling 后重组对象

Recombining object after pickling-unpickling

我遇到这样一种情况,一方(Alice)有一个复杂的自定义对象,其属性很复杂,可能涉及循环引用。然后,Alice 通过一个(加密的)套接字进行 pickle 和发送,将这个对象发送给两个独立的方 Bob 和 Claire。然后他们各自修改对象的一个​​属性,但他们更改的内容包括对他们从 Alice 那里收到的对象的复杂引用。 Bob 和 Claire 然后自己 pickle 自己修改的对象,并将其发回给 Alice。

问题是,爱丽丝如何结合鲍勃和克莱尔所做的更改?由于对象持久性在 pickling/unpickling 上丢失,将 Bob 或 Claire 创建的属性复制到原始对象的天真的想法行不通。我知道 persistent_id() 和 persistent_load() 在 pickling 中是如何工作的,但我非常想避免为 Alice 创建的对象中的每个属性手动编写规则。部分原因是它有一大堆嵌套和循环引用的对象(大约 10,000 多行),部分原因是我希望可以灵活地修改其余代码,而不必每次都更改 pickle/unpickle 的方式(以及正确测试的难度)。

这能做到吗?还是我必须吞下苦果并“手动”处理酸洗?

这是一个最小的具体示例。显然,这里可以轻松删除循环引用,或者 Bob 和 Claire 可以将他们的值发送给 Alice,但在我的真实情况下并非如此。

import pickle


class Shared:
    pass


class Bob:
    pass


class Claire:
    pass


class Alice:

    def __init__(self):
        self.shared = Shared()
        self.bob = Bob()
        self.claire = Claire()

    def add_some_data(self, x, y):
        self.shared.bob = self.bob
        self.shared.claire = self.claire
        self.shared.x = x
        self.shared.y = y

    def bob_adds_data(self, extra):
        self.bob.value = self.shared.x + self.shared.y + extra

    def claire_adds_data(self, extra):
        self.claire.value = self.shared.x * self.shared.y * extra


# Done on Alice's side
original = Alice()
original.add_some_data(2, 3)
outgoing = pickle.dumps(original)


# Done on Bob's side
bobs_copy = pickle.loads(outgoing)
bobs_copy.bob_adds_data(4)
bobs_reply = pickle.dumps(bobs_copy)


# Done on Claires's side
claires_copy = pickle.loads(outgoing)
claires_copy.claire_adds_data(5)
claires_reply = pickle.dumps(claires_copy)


# Done on Alice's side
from_bob = pickle.loads(bobs_reply)
from_claire = pickle.loads(claires_reply)
original.bob = from_bob.bob
original.claire = from_claire.claire
# If the circularly references were maintained, these two lines would be equal
# instead, the attributes on the bottom line do not exist because the reference is broken
print(original.bob.value, original.claire.value)
print(original.shared.bob.value, original.shared.claire.value)

部分解决方案

我有一个部分解决方案,对问题案例有一些限制。

限制

限制是 Alice 的对象在已知位置只有一个对 Bob 和 Claire 的引用。然而,后两者可以对自身和 Alice 具有任意复杂的引用,包括循环、嵌套和递归结构。另一个要求是 Bob 没有对 Claire 的任何引用,反之亦然:如果我们要求这两个对象以任何顺序独立更新,这是很自然的。

换句话说,爱丽丝从鲍勃那里收到了一些东西,这些东西被放在一个整洁的地方。困难在于使 Bob 中包含的引用与 Alice 包含的正确对象相匹配,但 Alice 本身的任何其他内容都不需要更改。这是我需要的用例,如果 Bob 和 Claire 可以对 Alice 进行任意更改,我不清楚更一般的情况是否可能。

想法

它的工作原理是有一个基础 class,它创建一个持久性 ID,该 ID 在对象的生命周期内不会改变,由 pickling/unpickling 维护,并且是唯一的。在这种情况下要维护其引用的任何对象都必须继承自此class。当 Bob 将他的更改发送给 Alice 时,他使用包含他从 Alice 收到的所有对象及其持久 ID 的字典进行 pickle,这样所有对预先存在的对象的引用都由持久 ID 编码。另一方面,爱丽丝也在做同样的事情。她使用一个持久性 id 的字典来解开 Bob 发送给她的内容,以反对她之前发送给 Bob 的所有内容。因此,虽然 Alice 和 Bob 拥有不同的实例,但某些对象的持久 id 是相同的,因此在不同方之间进行 pickling 时可以“交换”它们。

这可以很容易地与现有代码一起使用。它只包括向我们想要持久化的所有自定义 class 添加一个基础 class,并且每次我们 pickle/unpickle.

时添加一个小的添加。

模块

import io
import time
import pickle


class Persistent:

    def __init__(self):
        """Both unique and unchanging, even after modifying or pickling/unpickling object
        Potential problem if clocks are not in sync"""
        self.persistent_id = str(id(self)) + str(time.time())


def make_persistent_memo(obj):
    """Makes two dictionaries (one reverse of other) linking every instance of Persistent found
    in the attributes and collections of obj recursively, with the persistent id of that instant.
    Can cope with circular references and recursively nested objects"""

    def add_to_memo(item, id_to_obj, obj_to_id, checked):

        # Prevents checking the same object multiple times
        if id(item) in checked:
            return id_to_obj, obj_to_id, checked
        else:
            checked.add(id(item))

            if isinstance(item, Persistent):
                id_to_obj[item.persistent_id] = item
                obj_to_id[item] = item.persistent_id

        try:  # Try to add attributes of item to memo, recursively
            for sub_item in vars(item).values():
                add_to_memo(sub_item, id_to_obj, obj_to_id, checked)
        except TypeError:
            pass

        try:  # Try to add iterable elements of item to memo, recursively
            for sub_item in item:
                add_to_memo(sub_item, id_to_obj, obj_to_id, checked)
        except TypeError:
            pass

        return id_to_obj, obj_to_id, checked

    return add_to_memo(obj, {}, {}, set())[:2]


class PersistentPickler(pickle.Pickler):
    """ Normal pickler, but it takes a memo of the form {obj: persistent id}
    any object in that memo is pickled as its persistent id instead"""

    @staticmethod  # Because dumps is not defined for custom Picklers
    def dumps(obj_to_id_memo, obj):
        with io.BytesIO() as file:
            PersistentPickler(file, obj_to_id_memo).dump(obj)
            file.seek(0)
            return file.read()

    def __init__(self, file, obj_to_id_memo):
        super().__init__(file)
        self.obj_to_id_memo = obj_to_id_memo

    def persistent_id(self, obj):
        try:
            if obj in self.obj_to_id_memo and obj:
                return self.obj_to_id_memo[obj]
        except TypeError:  # If obj is unhashable
            pass
        return None


class PersistentUnPickler(pickle.Unpickler):
    """ Normal pickler, but it takes a memo of the form {persistent id: obj}
    used to undo the effects of PersistentPickler"""

    @staticmethod  # Because loads is not defined for custom Unpicklers
    def loads(id_to_obj_memo, pickled_data):
        with io.BytesIO(pickled_data) as file:
            obj = PersistentUnPickler(file, id_to_obj_memo).load()
        return obj

    def __init__(self, file, id_to_obj_memo):
        super().__init__(file)
        self.id_to_obj_memo = id_to_obj_memo

    def persistent_load(self, pid):
        if pid in self.id_to_obj_memo:
            return self.id_to_obj_memo[pid]
        else:
            super().persistent_load(pid)

使用示例

class Alice(Persistent):
    """ Must have a single attribute saved as bob or claire """

    def __init__(self):
        super().__init__()
        self.shared = Shared()
        self.bob = Bob()
        self.claire = Claire()

    def add_some_data(self, x, y):
        self.nested = [self]
        self.nested.append(self.nested)
        self.shared.x = x
        self.shared.y = y


class Bob(Persistent):
    """ Can have arbitrary reference to itself and to Alice but must not touch Claire """

    def make_changes(self, alice, extra):
        self.value = alice.shared.x + alice.shared.y + extra
        self.attribute = alice.shared
        self.collection = [alice.bob, alice.shared]
        self.collection.append(self.collection)
        self.new = Shared()


class Claire(Persistent):
    """ Can have arbitrary reference to itself and to Alice but must not touch Bob """

    def make_changes(self, alice, extra):
        self.value = alice.shared.x * alice.shared.y * extra
        self.attribute = alice
        self.collection = {"claire": alice.claire, "shared": alice.shared}
        self.collection["collection"] = self.collection


class Shared(Persistent):
    pass


# Done on Alice's side
alice = Alice()
alice.add_some_data(2, 3)
outgoing = pickle.dumps(alice)

# Done on Bob's side
bobs_copy = pickle.loads(outgoing)
# Create a memo of the persistent_id of the received objects that are *not* being modified
_, bob_pickling_memo = make_persistent_memo(bobs_copy)
bob_pickling_memo.pop(bobs_copy.bob)
# Make changes and send everything back to Alice
bobs_copy.bob.make_changes(bobs_copy, 4)
bobs_reply = PersistentPickler.dumps(bob_pickling_memo, bobs_copy.bob)


# Same on Claires's side
claires_copy = pickle.loads(outgoing)

_, claire_pickling_memo = make_persistent_memo(claires_copy)
claire_pickling_memo.pop(claires_copy.claire)

claires_copy.claire.make_changes(claires_copy, 5)
claires_reply = PersistentPickler.dumps(claire_pickling_memo, claires_copy.claire)


# Done on Alice's side
alice_unpickling_memo, _ = make_persistent_memo(alice)
alice.bob = PersistentUnPickler.loads(alice_unpickling_memo, bobs_reply)
alice.claire = PersistentUnPickler.loads(alice_unpickling_memo, claires_reply)

# Check that Alice has received changes from Bob and Claire
print(alice.bob.value == bobs_copy.bob.value == 9,
      alice.claire.value == claires_copy.claire.value == 30)
# Check that all references match up as expected
print("Alice:", alice is alice.nested[0] is alice.nested[1][0] is alice.claire.attribute)

print("Bob:", (alice.bob is alice.nested[0].bob is alice.bob.collection[0] is
               alice.bob.collection[2][0]))

print("Claire:", (alice.claire is alice.nested[0].claire is alice.claire.collection["claire"] is
                  alice.claire.collection["collection"]["claire"]))

print("Shared:", (alice.shared is alice.bob.attribute is alice.bob.collection[1] is
                  alice.bob.collection[2][1] is alice.claire.collection["shared"] is
                  alice.claire.collection["collection"]["shared"] is not alice.bob.new))

输出

C>python test.py
True True
Alice: True
Bob: True
Claire: True
Shared: True

完全符合要求

跟进

感觉自己在做自己的嵌套自省,这是在重新发明轮子,用现有的工具可以做得更好吗?

我的代码感觉比较低效,自省了很多,是否可以改进?

我可以确定 add_to_memo() 没有遗漏一些引用吗?

使用 time.time() 创建持久化 id 感觉很笨拙,有没有更好的选择?