Python

Question

我正在使用 python class 的 set() 和 __hash__ 方法来防止在集合中添加相同的哈希对象。根据 python data-model document，set() 将相同的散列对象视为相同的对象，只需将它们添加一次。

但它的行为不同如下：

class MyClass(object):

    def __hash__(self):
        return 0

result = set()
result.add(MyClass())
result.add(MyClass())

print(len(result)) # len = 2

虽然在字符串值的情况下，它工作正常。

result.add('aida')
result.add('aida')

print(len(result)) # len = 1

我的问题是：为什么相同的散列对象在集合中不相同？

Answer 1

集合需要两个方法来使对象可散列：__hash__ 和 __eq__。两个实例必须 return 相同的哈希值才被认为是相等的。如果哈希值同时存在于集合和中，则认为该实例已经存在于集合中，该实例被认为等于集合中具有相同哈希值的实例之一。

你的 class 没有实现 __eq__，所以使用默认的 object.__eq__，如果 obj1 is obj2 也是真的。换句话说，两个实例只有在 完全相同的实例 时才被认为是相等的。

就集合而言，仅仅因为它们的哈希值匹配并不能使它们独一无二；即使具有不同哈希值的对象也可以在相同的哈希值 table 槽中结束，因为使用了针对 table 大小的哈希值的模数。

添加自定义 __eq__ 方法，当两个实例应该相等时 returns True：

def __eq__(self, other):
    if not isinstance(other, type(self)):
        return False
    # all instances of this class are considered equal to one another
    return True

Answer 2

您的读数不正确。 __eq__ 方法用于相等性检查。文档只是声明 __hash__ 值对于 2 个对象 a 和 b 也必须相同，其中 a == b（即 a.__eq__(b)）为真。

这是一个常见的逻辑错误：a == b 为真 implies that hash(a) == hash(b) is also true. However, an implication does not necessarily mean equivalence，除了之前的，hash(a) == hash(b) 还意味着a == b.

要使MyClass的所有实例相互比较相等，您需要为它们提供一个__eq__方法；否则 Python 将比较他们的 身份而不是 。这可能会：

class MyClass(object):
    def __hash__(self):
        return 0
    def __eq__(self, other):
        # another object is equal to self, iff 
        # it is an instance of MyClass
        return isinstance(other, MyClass)

现在：

>>> result = set()
>>> result.add(MyClass())
>>> result.add(MyClass())
1

实际上，您会将 __hash__ 基于对象的那些用于 __eq__ 比较的属性，例如：

class Person
    def __init__(self, name, ssn):
        self.name = name
        self.ssn = ssn

    def __eq__(self, other):
        return isinstance(other, Person) and self.ssn == other.ssn

    def __hash__(self):
        # use the hashcode of self.ssn since that is used
        # for equality checks as well
        return hash(self.ssn)

p = Person('Foo Bar', 123456789)
q = Person('Fake Name', 123456789)
print(len({p, q})  # 1

Python - class hash 方法和集合

Python - class hash method and set

hash

set

python-datamodel

python-3.x

Python - class __hash__ 方法和集合

Python - class __hash__ method and set

python

hash

set

python-datamodel

python-3.x

Python - class hash 方法和集合

Python - class hash method and set