如何确保属性值在 Python 中是唯一的?

How do I make sure an attribute value is unique in Python?

我正在抓取一个包含人员列表的网站。同一个人可以出现不止一次,也可以多人同名:

Tommy Atkins (id:312)
Tommy Atkins (id:183)
Tommy Atkins (id:312)

我想为每个人创建一个对象并丢弃重复项。

我目前正在使用列表理解来遍历所有 class 个实例并查看 key 是否已在使用中。有更简单的方法吗?

class Object:
    def __init__(self, key):
        if [object for object in objects if object.key == key]:
            raise Exception('key {} already exists'.format(key))
        else: self.key = key

objects = []
objects.append(Object(1))
objects.append(Object(1)) # Exception: key 1 already exists

您的 ID 的全局存储很好,但最好利用 set 而不是 list,因为检查 i in {} 是 O(1) 而 i in [] 是 O(N)

在你的 class 中定义 __eq__ and __hash__,根据 key 的值比较实例并使用它计算哈希值。而不是列表使用 set 因为它会以有效的方式自动为您过滤重复项:

class Object:
    def __init__(self, key):
        self.key = key

    def __eq__(self, other):
        if isinstance(other, type(self)):
            return self.key == other.key 
        return NotImplemented

    def __ne__(self, other):
        return not type(self).__eq__(self, other)

    def __hash__(self):
        return hash(self.key)


objects = set()
o1 = Object(1)
o2 = Object(1)
objects.add(o1)
objects.add(o2)

print (o1, o2)   # <__main__.Object object at 0x105996ba8> <__main__.Object object at 0x105996be0>
print (objects)  # {<__main__.Object object at 0x105996ba8>}

不要将实例永久分配给一个变量,否则它不会被垃圾回收(请注意,这仅适用于 CPython):

objects = set()

for _ in range(5):
    ins = Object(1)
    print(id(ins))
    objects.add(ins)

输出:

4495640448 # First instance and this is now stored in the set
           # hence it is not going to be garbage collected. 
4495640840 # Python is now using new memory space.
4495640896 # Right now 4495640840 is still owned by the 
           # previous instance, hence use new memory address
           # But after this assignment the instance at 4495640840 
           # has no more references, i.e ins now points to 4495640896
4495640840 # Re-use 4495640840
4495640896 # Repeat...