使具有不可拾取字段的对象可拾取的正确方法是什么?

What is the proper way to make an object with unpickable fields pickable?

对我来说,我所做的是检测什么是不可拾取的并将其变成一个字符串(我想我也可以删除它但是它会错误地告诉我该字段不存在但我宁愿拥有它存在但是是一个字符串)。但我想知道是否有更正式的方式来做到这一点。

我目前使用的代码:

def make_args_pickable(args: Namespace) -> Namespace:
    """
    Returns a copy of the args namespace but with unpickable objects as strings.

    note: implementation not tested against deep copying.
    ref:
        - 
    """
    pickable_args = argparse.Namespace()
    # - go through fields in args, if they are not pickable make it a string else leave as it
    # The vars() function returns the __dict__ attribute of the given object.
    for field in vars(args):
        field_val: Any = getattr(args, field)
        if not dill.pickles(field_val):
            field_val: str = str(field_val)
        setattr(pickable_args, field, field_val)
    return pickable_args

上下文:我想我这样做主要是为了删除我随身携带的烦人的张量板对象(但我认为我不再需要 .tb 字段,感谢 wandb/weights and biases).并不是说这很重要,但上下文总是很好。

相关:


编辑:

自从我决定放弃 dill - 因为有时它无法恢复 classes/objects (可能是因为它无法保存他们的代码或其他东西) - 我决定只使用 pickle (这似乎是在 PyTorch 中推荐的完成方式。

那么,在没有莳萝或官方泡菜的情况下,检查可腌菜的官方(也许是优化过的)方法是什么?

这是最好的吗:

def is_picklable(obj):
  try:
    pickle.dumps(obj)

  except pickle.PicklingError:
    return False
  return True

因此当前解决方案:

def make_args_pickable(args: Namespace) -> Namespace:
    """
    Returns a copy of the args namespace but with unpickable objects as strings.

    note: implementation not tested against deep copying.
    ref:
        - 
    """
    pickable_args = argparse.Namespace()
    # - go through fields in args, if they are not pickable make it a string else leave as it
    # The vars() function returns the __dict__ attribute of the given object.
    for field in vars(args):
        field_val: Any = getattr(args, field)
        # - if current field value is not pickable, make it pickable by casting to string
        if not dill.pickles(field_val):
            field_val: str = str(field_val)
        elif not is_picklable(field_val):
            field_val: str = str(field_val)
        # - after this line the invariant is that it should be pickable, so set it in the new args obj
        setattr(pickable_args, field, field_val)
    return pickable_args


def make_opts_pickable(opts):
    """ Makes a namespace pickable """
    return make_args_pickable(opts)


def is_picklable(obj: Any) -> bool:
    """
    Checks if somehting is pickable.

    Ref:
        - 
    """
    import pickle
    try:
        pickle.dumps(obj)
    except pickle.PicklingError:
        return False
    return True

注意:我想要“官方”/测试的原因之一是因为我在 try catch: 上得到 pycharm 暂停,这不是我想要的...我希望它只在未处理的异常时停止。

是的,try/except 是解决此问题的最佳方法。

根据文档,pickle 能够递归地腌制对象,也就是说,如果您有一个可腌制的对象列表,如果您尝试腌制,它将腌制该列表中的所有对象泡菜那个名单。这意味着您无法在不对对象进行酸洗的情况下测试该对象是否可酸洗。因此,您的结构:

def is_picklable(obj):
  try:
    pickle.dumps(obj)

  except pickle.PicklingError:
    return False
  return True

是检查此内容的最简单方法。如果您不使用递归结构 and/or,您可以安全地假设所有递归结构将只包含可腌制对象,您可以检查对象的 type() 值与 list of pickleable objects:

  • None,正确,错误
  • 整数、浮点数、复数
  • 字符串、字节、字节数组
  • 仅包含可腌制对象的元组、列表、集合和字典
  • 在模块顶层定义的函数(使用 def,而不是 lambda)
  • 在模块顶层定义的内置函数
  • 类 在模块的顶层定义
  • 此类 类 的实例,其 dict 或调用 getstate() 的结果是可腌制的(参见 Pickling Class实例了解详情)。

这可能比使用您在问题中显示的 try:... except:... 更快。

What is the proper way to make an object with unpickable fields pickable?

我相信这个问题的答案属于您链接的问题 -- Python - 我怎样才能使这个 un-pickleable 对象可腌制?。我在该问题中添加了一个 来解释如何在不使用 __reduce__.

的情况下以正确的方式使不可腌制的对象可腌制

So what is the official (perhaps optimized) way to check for pickables without dill or with the official pickle?

可 picklable 的对象在文档中定义如下:

  • None, True, and False
  • integers, floating point numbers, complex numbers
  • strings, bytes, bytearrays
  • tuples, lists, sets, and dictionaries containing only picklable objects
  • functions defined at the top level of a module (using def, not lambda)
  • built-in functions defined at the top level of a module
  • classes that are defined at the top level of a module
  • instances of such classes whose dict or the result of calling getstate() is picklable (see section Pickling Class Instances for details).

棘手的部分是 (1) 了解 functions/classes 是如何定义的(您可能可以为此使用 inspect 模块)和 (2) 递归遍历对象,检查上述规则。

对此有很多注意事项,例如 pickle protocol versions, whether the object is an extension type (defined in a C extension like numpy, for example) or an instance of a 'user-defined' class. Usage of __slots__ can also impact whether an object is picklable or not (since __slots__ means there's no __dict__), but can be pickled with __getstate__. Some objects may also be registered 具有用于酸洗的自定义函数。所以,你需要知道这是否也发生过。

从技术上讲,您可以在 Python 中实现一个函数来检查所有这些,但相比之下它会很慢。最简单的(可能也是最高效的,因为 pickleimplemented in C)做到这一点的方法是简单地尝试腌制要检查的对象。

我用 PyCharm 酸洗各种各样的东西来测试这个......它不会用这种方法停止。关键是您必须预见到几乎所有类型的异常(请参阅文档中的 footnote 3)。警告是可选的,它们主要用于解释此问题的上下文。

def is_picklable(obj: Any) -> bool:
    try:
        pickle.dumps(obj)
        return True
    except (pickle.PicklingError, pickle.PickleError, AttributeError, ImportError):
        # https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled
        return False
    except RecursionError:
        warnings.warn(
            f"Could not determine if object of type {type(obj)!r} is picklable"
            "due to a RecursionError that was supressed. "
            "Setting a higher recursion limit MAY allow this object to be pickled"
        )
        return False
    except Exception as e:
        # https://docs.python.org/3/library/pickle.html#id9
        warnings.warn(
            f"An error occurred while attempting to pickle"
            f"object of type {type(obj)!r}. Assuming it's unpicklable. The exception was {e}"
        )
        return False

使用我在上面链接的其他答案中的示例,您可以通过实现 __getstate____setstate__(或子classing 并添加它们,或使一个包装器 class) 调整你的 make_args_pickable...

class Unpicklable:
    """
    A simple marker class so we can distinguish when a deserialized object
    is a string because it was originally unpicklable 
    (and not simply a string to begin with)
    """
    def __init__(self, obj_str: str):
        self.obj_str = obj_str

    def __str__(self):
        return self.obj_str

    def __repr__(self):
        return f'Unpicklable(obj_str={self.obj_str!r})'


class PicklableNamespace(Namespace):
    def __getstate__(self):
        """For serialization"""

        # always make a copy so you don't accidentally modify state
        state = self.__dict__.copy()

        # Any unpicklables will be converted to a ``Unpicklable`` object 
        # with its str format stored in the object
        for key, val in state.items():
            if not is_picklable(val):
                state[key] = Unpicklable(str(val))
        return state
    def __setstate__(self, state):
        self.__dict__.update(state)  # or leave unimplemented

实际上,我将 pickle 一个命名空间,其属性包含一个文件句柄(通常不可 picklable),然后加载 pickle 数据。

# Normally file handles are not picklable
p = PicklableNamespace(f=open('test.txt'))

data = pickle.dumps(p)
del p

loaded_p = pickle.loads(data)
# PicklableNamespace(f=Unpicklable(obj_str="<_io.TextIOWrapper name='test.txt' mode='r' encoding='cp1252'>"))

对我来说,无论出现什么错误,我都希望我的函数告诉我它不可拾取。所以如果我这样做似乎有效:

def is_picklable(obj: Any) -> bool:
    """
    Checks if somehting is pickable.

    Ref:
        - 
        - pycharm halting all the time issue: 
    """
    import pickle
    try:
        pickle.dumps(obj)
    except:
        return False
    return True

此外,作为额外的奖励,它不会 pycharm 出乎意料,请参阅 了解详细信息。