“列表中的对象”与“字典中的对象”的行为不同？

Question

我有一个迭代器，里面有一些对象，我想创建一个 uniqueUsers 集合，我只在其中列出每个用户一次。所以玩了一下我用列表和字典试了一下：

>>> for m in ms: print m.to_user  # let's first look what's inside ms
...
Pete Kramer
Pete Kramer
Pete Kramer
>>> 
>>> uniqueUsers = []  # Create an empty list
>>> for m in ms:
...     if m.to_user not in uniqueUsers:
...         uniqueUsers.append(m.to_user)
...
>>> uniqueUsers
[Pete Kramer]  # This is what I would expect
>>> 
>>> uniqueUsers = {}  # Now let's create a dict
>>> for m in ms:
...     if m.to_user not in uniqueUsers:
...         uniqueUsers[m.to_user] = 1
...
>>> uniqueUsers
{Pete Kramer: 1, Pete Kramer: 1, Pete Kramer: 1}

所以我在执行 if 语句时通过将 dict 转换为列表来对其进行测试，结果如我所料：

>>> uniqueUsers = {}
>>> for m in ms:
...     if m.to_user not in list(uniqueUsers):
...         uniqueUsers[m.to_user] = 1
...
>>> uniqueUsers
{Pete Kramer: 1}

我可以通过针对 uniqueUsers.keys() 进行测试得到类似的结果。

问题是我不明白为什么会出现这种差异。我一直认为如果你这样做 if object in dict，它只是创建一个字典键列表并再次测试它，但显然不是这样。

谁能解释一下 object in dict 的内部工作原理以及为什么它的行为与 object in list 不同（正如我所期望的那样）？

Answer 1

为了了解发生了什么，您必须了解 in 运算符 membership test 对不同类型的行为。

对于列表，这非常简单，因为列表的本质是：不关心重复项的有序数组。此处执行成员资格测试的唯一可能方法是遍历列表并检查 equality 上的每个项目。像这样：

# x in lst
for item in lst:
    if x == item:
        return True
return False

字典有点不同：它们是散列 tables，键是唯一的。 Hash tables 要求键是 hashable 这实质上意味着需要有一个显式函数将对象转换为整数。然后使用此哈希值将 key/value 映射放入哈希 table.

的某处

由于哈希值决定了项目在哈希 table 中的位置，因此本应相同的对象产生相同的哈希值至关重要。所以下面的含义必须为真：x == y => hash(x) == hash(y)。不过，反之亦然。让不同的对象产生相同的哈希值是完全有效的。

当对字典执行成员资格测试时，字典会首先查找哈希值。如果它能找到它，那么它将对它找到的所有项目执行相等性检查；如果它没有找到哈希值，那么它假定它是一个不同的对象：

# x in dct
h = hash(x)
items = getItemsForHash(dct, h)
for item in items:
    if x == item:
        return True
# items is empty, or no match inside the loop
return False

由于您在对列表使用成员资格测试时得到了期望的结果，这意味着您的对象实现了相等比较（__eq__) correctly. But since you do not get the correct result when using a dictionary, there seems to be a __hash__ 与相等比较实现不同步的实现：

>>> class SomeType:
        def __init__ (self, x):
            self.x = x
        def __eq__ (self, other):
            return self.x == other.x
        def __hash__ (self):
            # bad hash implementation
            return hash(id(self))

>>> l = [SomeType(1)]
>>> d = { SomeType(1): 'x' }
>>> x = SomeType(1)
>>> x in l
True
>>> x in d
False

请注意，对于 Python 2 中的新样式类（继承自 object 的类），此“错误的哈希实现”（基于对象 ID) 是默认值。所以当你不实现自己的 __hash__ 函数时，它仍然使用那个函数。这最终意味着，除非您的 __eq__ 仅执行身份检查（默认），否则哈希函数将不同步。

因此解决方案是以与 __eq__ 中使用的规则一致的方式实施 __hash__。例如，如果您比较两个成员 self.x 和 self.y，那么您应该对这两个成员使用复合散列。最简单的方法是 return 这些值的元组的散列值：

class SomeType (object):
    def __init__ (self, x, y):
        self.x = x
        self.y = y

    def __eq__ (self, other):
        return self.x == other.x and self.y == other.y

    def __hash__ (self):
        return hash((self.x, self.y))

请注意，如果对象是 mutable:

，则不应使该对象可哈希

If a class defines mutable objects and implements an __eq__() method, it should not implement __hash__(), since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).

Answer 2

TL;DR：in 测试调用 __eq__ 列表。对于字典，它首先调用 __hash__，如果哈希匹配，则调用 __eq__.

in 测试只为列表调用 __eq__。
- 没有 __eq__，in-ness 比较总是 False。
对于字典，你需要一个正确实现的__hash__和__eq__才能比较其中的对象正确:
- 首先从__hash__
  获取对象的散列
  - 没有 __hash__，对于新式 classes，它使用 id()，它对所有创建的对象都是唯一的，因此永远不会匹配现有对象，除非它是 相同的对象。
  - 正如@poke 在评论中指出的那样：
    
    In Python 2, new style classes (inheriting from object) inherit object’s __hash__ implementation which is based on id(), so that’s where that comes from.
- 如果散列匹配，则然后 __eq__ 为该对象调用 other.
  - 然后结果取决于__eq__ returns.
- 如果散列不匹配，则__eq__未调用。

因此 in 测试调用 __eq__ 用于列表和字典...但对于字典，仅在 __hash__[= 之后92=] return 是一个匹配的散列。 而没有 __hash__ 不会 return None，不会抛出错误也不会做到 "unhashable"。 ...in Python 2. 要正确使用 to_user class 作为字典键，您需要有一个正确实现的 __hash__ method，与 __eq__.

详情：

m.to_user not in uniqueUsers "object in list" 的检查工作正常，因为您可能已经实现了 __eq__ 方法，正如@poke 指出的那样。（看起来 to_user return 是一个对象，而不是字符串。）

同样的检查对 "object in dict" 不起作用，因为：
(a) __hash__ 因为 class 实施不当，正如@poke 也指出的那样。
(b) 或者你根本没有实现__hash__。这不会在 Python2 new-style classes.
中引发错误
以the class in this answer为起点：

>>> class Test2(object): ... def __init__(self, name): ... self.name = name ... ... def __eq__(self, other): ... return self.name == other.name ... >>> test_Dict = {} >>> test_List = [] >>> >>> obj1 = Test2('a') >>> obj2 = Test2('a') >>> >>> test_Dict[obj1] = 'x' >>> test_Dict[obj2] = 'y' >>> >>> test_List.append(obj1) >>> test_List.append(obj2) >>> >>> test_Dict {<__main__.Test2 object at 0x0000000002EFC518>: 'x', <__main__.Test2 object at 0x0000000002EFC940>: 'y'} >>> test_List [<__main__.Test2 object at 0x0000000002EFC518>, <__main__.Test2 object at 0x0000000002EFC940>] >>> >>> Test2('a') in test_Dict False >>> Test2('a') in test_List True

“列表中的对象”与“字典中的对象”的行为不同？

`object in list` behaves different from `object in dict`?

python

dictionary

if-statement

cpython

list