如何使 python 数据类可哈希而不使其不可变?
How can I make a python dataclass hashable without making them immutable?
假设我在 python3 中有一个数据类。我希望能够散列和排序这些对象。我不希望这些是不可变的。
我只想要它们 ordered/hashed 在 id 上。
我在文档中看到我可以实现 _hash_ 以及所有这些,但我想让数据计算为我完成工作,因为它们旨在处理这个。
from dataclasses import dataclass, field
@dataclass(eq=True, order=True)
class Category:
id: str = field(compare=True)
name: str = field(default="set this in post_init", compare=False)
a = sorted(list(set([ Category(id='x'), Category(id='y')])))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'Category'
TL;DR
将 frozen=True
与 eq=True
结合使用(这将使实例不可变)。
长答案
来自docs:
__hash__()
is used by built-in hash()
, and when objects are added to hashed collections such as dictionaries and sets. Having a __hash__()
implies that instances of the class are immutable. Mutability is a
complicated property that depends on the programmer’s intent, the
existence and behavior of __eq__()
, and the values of the eq and
frozen flags in the dataclass()
decorator.
By default, dataclass()
will not implicitly add a __hash__()
method
unless it is safe to do so. Neither will it add or change an existing
explicitly defined __hash__()
method. Setting the class attribute
__hash__ = None
has a specific meaning to Python, as described in the __hash__()
documentation.
If __hash__()
is not explicit defined, or if it is set to None, then
dataclass()
may add an implicit __hash__()
method. Although not
recommended, you can force dataclass()
to create a __hash__()
method
with unsafe_hash=True
. This might be the case if your class is
logically immutable but can nonetheless be mutated. This is a
specialized use case and should be considered carefully.
Here are the rules governing implicit creation of a __hash__()
method.
Note that you cannot both have an explicit __hash__()
method in your
dataclass and set unsafe_hash=True
; this will result in a TypeError
.
If eq and frozen are both true, by default dataclass()
will generate a
__hash__()
method for you. If eq is true and frozen is false, __hash__()
will be set to None, marking it unhashable (which it is, since it is mutable). If eq is false, __hash__()
will be left
untouched meaning the __hash__()
method of the superclass will be used
(if the superclass is object, this means it will fall back to id-based
hashing).
来自 the docs:
Here are the rules governing implicit creation of a __hash__()
method:
[...]
If eq
and frozen
are both true, by default dataclass()
will
generate a __hash__()
method for you. If eq
is true and frozen
is false, __hash__()
will be set to None
, marking it unhashable
(which it is, since it is mutable). If eq
is false, __hash__()
will be left untouched meaning the __hash__()
method of the
superclass will be used (if the superclass is object, this means it
will fall back to id-based hashing).
由于您设置了 eq=True
并保留了默认值 frozen
(False
),因此您的数据class 无法散列。
您有 3 个选项:
- 设置
frozen=True
(除了 eq=True
),这将使您的 class 不可变且可散列。
设置 unsafe_hash=True
,这将创建一个 __hash__
方法但使您的 class 可变,因此如果您的 class 的实例存在问题在存储在字典或集合中时被修改:
cat = Category('foo', 'bar')
categories = {cat}
cat.id = 'baz'
print(cat in categories) # False
- 手动实现一个
__hash__
方法。
我想添加一个使用 unsafe_hash 的特别说明。
您可以通过设置 compare=False 或 hash=False 排除字段进行哈希比较。 (hash 默认继承自 compare)。
如果您将节点存储在图中,但想在不破坏其散列的情况下将它们标记为已访问(例如,如果它们位于一组未访问的节点中......),这可能会很有用。
from dataclasses import dataclass, field
@dataclass(unsafe_hash=True)
class node:
x:int
visit_count: int = field(default=10, compare=False) # hash inherits compare setting. So valid.
# visit_count: int = field(default=False, hash=False) # also valid. Arguably easier to read, but can break some compare code.
# visit_count: int = False # if mutated, hashing breaks. (3* printed)
s = set()
n = node(1)
s.add(n)
if n in s: print("1* n in s")
n.visit_count = 11
if n in s:
print("2* n still in s")
else:
print("3* n is lost to the void because hashing broke.")
我花了 小时 才弄清楚...我发现 python 文档是关于数据类的有用的进一步阅读。具体请参阅字段文档和数据类 arg 文档。
https://docs.python.org/3/library/dataclasses.html
假设我在 python3 中有一个数据类。我希望能够散列和排序这些对象。我不希望这些是不可变的。
我只想要它们 ordered/hashed 在 id 上。
我在文档中看到我可以实现 _hash_ 以及所有这些,但我想让数据计算为我完成工作,因为它们旨在处理这个。
from dataclasses import dataclass, field
@dataclass(eq=True, order=True)
class Category:
id: str = field(compare=True)
name: str = field(default="set this in post_init", compare=False)
a = sorted(list(set([ Category(id='x'), Category(id='y')])))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'Category'
TL;DR
将 frozen=True
与 eq=True
结合使用(这将使实例不可变)。
长答案
来自docs:
__hash__()
is used by built-inhash()
, and when objects are added to hashed collections such as dictionaries and sets. Having a__hash__()
implies that instances of the class are immutable. Mutability is a complicated property that depends on the programmer’s intent, the existence and behavior of__eq__()
, and the values of the eq and frozen flags in thedataclass()
decorator.By default,
dataclass()
will not implicitly add a__hash__()
method unless it is safe to do so. Neither will it add or change an existing explicitly defined__hash__()
method. Setting the class attribute__hash__ = None
has a specific meaning to Python, as described in the__hash__()
documentation.If
__hash__()
is not explicit defined, or if it is set to None, thendataclass()
may add an implicit__hash__()
method. Although not recommended, you can forcedataclass()
to create a__hash__()
method withunsafe_hash=True
. This might be the case if your class is logically immutable but can nonetheless be mutated. This is a specialized use case and should be considered carefully.Here are the rules governing implicit creation of a
__hash__()
method. Note that you cannot both have an explicit__hash__()
method in your dataclass and setunsafe_hash=True
; this will result in aTypeError
.If eq and frozen are both true, by default
dataclass()
will generate a__hash__()
method for you. If eq is true and frozen is false,__hash__()
will be set to None, marking it unhashable (which it is, since it is mutable). If eq is false,__hash__()
will be left untouched meaning the__hash__()
method of the superclass will be used (if the superclass is object, this means it will fall back to id-based hashing).
来自 the docs:
Here are the rules governing implicit creation of a
__hash__()
method:[...]
If
eq
andfrozen
are both true, by defaultdataclass()
will generate a__hash__()
method for you. Ifeq
is true andfrozen
is false,__hash__()
will be set toNone
, marking it unhashable (which it is, since it is mutable). Ifeq
is false,__hash__()
will be left untouched meaning the__hash__()
method of the superclass will be used (if the superclass is object, this means it will fall back to id-based hashing).
由于您设置了 eq=True
并保留了默认值 frozen
(False
),因此您的数据class 无法散列。
您有 3 个选项:
- 设置
frozen=True
(除了eq=True
),这将使您的 class 不可变且可散列。 设置
unsafe_hash=True
,这将创建一个__hash__
方法但使您的 class 可变,因此如果您的 class 的实例存在问题在存储在字典或集合中时被修改:cat = Category('foo', 'bar') categories = {cat} cat.id = 'baz' print(cat in categories) # False
- 手动实现一个
__hash__
方法。
我想添加一个使用 unsafe_hash 的特别说明。
您可以通过设置 compare=False 或 hash=False 排除字段进行哈希比较。 (hash 默认继承自 compare)。
如果您将节点存储在图中,但想在不破坏其散列的情况下将它们标记为已访问(例如,如果它们位于一组未访问的节点中......),这可能会很有用。
from dataclasses import dataclass, field
@dataclass(unsafe_hash=True)
class node:
x:int
visit_count: int = field(default=10, compare=False) # hash inherits compare setting. So valid.
# visit_count: int = field(default=False, hash=False) # also valid. Arguably easier to read, but can break some compare code.
# visit_count: int = False # if mutated, hashing breaks. (3* printed)
s = set()
n = node(1)
s.add(n)
if n in s: print("1* n in s")
n.visit_count = 11
if n in s:
print("2* n still in s")
else:
print("3* n is lost to the void because hashing broke.")
我花了 小时 才弄清楚...我发现 python 文档是关于数据类的有用的进一步阅读。具体请参阅字段文档和数据类 arg 文档。 https://docs.python.org/3/library/dataclasses.html