如何确保属性值在 Python 中是唯一的?
How do I make sure an attribute value is unique in Python?
我正在抓取一个包含人员列表的网站。同一个人可以出现不止一次,也可以多人同名:
Tommy Atkins (id:312)
Tommy Atkins (id:183)
Tommy Atkins (id:312)
我想为每个人创建一个对象并丢弃重复项。
我目前正在使用列表理解来遍历所有 class 个实例并查看 key
是否已在使用中。有更简单的方法吗?
class Object:
def __init__(self, key):
if [object for object in objects if object.key == key]:
raise Exception('key {} already exists'.format(key))
else: self.key = key
objects = []
objects.append(Object(1))
objects.append(Object(1)) # Exception: key 1 already exists
您的 ID 的全局存储很好,但最好利用 set
而不是 list
,因为检查 i in {}
是 O(1) 而 i in []
是 O(N)
在你的 class 中定义 __eq__
and __hash__
,根据 key
的值比较实例并使用它计算哈希值。而不是列表使用 set
因为它会以有效的方式自动为您过滤重复项:
class Object:
def __init__(self, key):
self.key = key
def __eq__(self, other):
if isinstance(other, type(self)):
return self.key == other.key
return NotImplemented
def __ne__(self, other):
return not type(self).__eq__(self, other)
def __hash__(self):
return hash(self.key)
objects = set()
o1 = Object(1)
o2 = Object(1)
objects.add(o1)
objects.add(o2)
print (o1, o2) # <__main__.Object object at 0x105996ba8> <__main__.Object object at 0x105996be0>
print (objects) # {<__main__.Object object at 0x105996ba8>}
不要将实例永久分配给一个变量,否则它不会被垃圾回收(请注意,这仅适用于 CPython):
objects = set()
for _ in range(5):
ins = Object(1)
print(id(ins))
objects.add(ins)
输出:
4495640448 # First instance and this is now stored in the set
# hence it is not going to be garbage collected.
4495640840 # Python is now using new memory space.
4495640896 # Right now 4495640840 is still owned by the
# previous instance, hence use new memory address
# But after this assignment the instance at 4495640840
# has no more references, i.e ins now points to 4495640896
4495640840 # Re-use 4495640840
4495640896 # Repeat...
我正在抓取一个包含人员列表的网站。同一个人可以出现不止一次,也可以多人同名:
Tommy Atkins (id:312)
Tommy Atkins (id:183)
Tommy Atkins (id:312)
我想为每个人创建一个对象并丢弃重复项。
我目前正在使用列表理解来遍历所有 class 个实例并查看 key
是否已在使用中。有更简单的方法吗?
class Object:
def __init__(self, key):
if [object for object in objects if object.key == key]:
raise Exception('key {} already exists'.format(key))
else: self.key = key
objects = []
objects.append(Object(1))
objects.append(Object(1)) # Exception: key 1 already exists
您的 ID 的全局存储很好,但最好利用 set
而不是 list
,因为检查 i in {}
是 O(1) 而 i in []
是 O(N)
在你的 class 中定义 __eq__
and __hash__
,根据 key
的值比较实例并使用它计算哈希值。而不是列表使用 set
因为它会以有效的方式自动为您过滤重复项:
class Object:
def __init__(self, key):
self.key = key
def __eq__(self, other):
if isinstance(other, type(self)):
return self.key == other.key
return NotImplemented
def __ne__(self, other):
return not type(self).__eq__(self, other)
def __hash__(self):
return hash(self.key)
objects = set()
o1 = Object(1)
o2 = Object(1)
objects.add(o1)
objects.add(o2)
print (o1, o2) # <__main__.Object object at 0x105996ba8> <__main__.Object object at 0x105996be0>
print (objects) # {<__main__.Object object at 0x105996ba8>}
不要将实例永久分配给一个变量,否则它不会被垃圾回收(请注意,这仅适用于 CPython):
objects = set()
for _ in range(5):
ins = Object(1)
print(id(ins))
objects.add(ins)
输出:
4495640448 # First instance and this is now stored in the set
# hence it is not going to be garbage collected.
4495640840 # Python is now using new memory space.
4495640896 # Right now 4495640840 is still owned by the
# previous instance, hence use new memory address
# But after this assignment the instance at 4495640840
# has no more references, i.e ins now points to 4495640896
4495640840 # Re-use 4495640840
4495640896 # Repeat...