Python 中巨大增长列表的高效逆序比较

Question

在 Python 中，我的目标是维护一个唯一的点列表（复杂标量，四舍五入），同时稳定地创建具有函数的新点列表，就像在这个伪代码中一样

list_of_points = []

while True
   # generate new point according to some rule
   z = generate() 

   # check whether this point is already there
   if z not in list_of_points:
      list_of_points.append(z)
   
   if some_condition:
      break

现在 list_of_points 在此过程中可能会变得非常庞大（例如 1000 万个条目甚至更多），并且重复项非常频繁。事实上，大约 50% 的时间，新创建的点已经在列表中的某处。然而，我所知道的是，通常已经存在的点 接近列表的末尾 。有时它在“批量”中，只是偶尔会在开头附近找到。

这让我想到了以相反的顺序进行搜索。但是，考虑到我可能会在这个过程中增长的庞大列表，我将如何最有效地做到这一点（就原始速度而言）。 list 容器在这里是最好的方法吗？

通过这样做我设法获得了一些性能

list_of_points = []

while True
   # generate new point according to some rule
   z = generate() 

   # check very end of list
   if z in list_of_points[-10:]:
      continue

   # check deeper into the list
   if z in list_of_points[-100:-10]:
      continue

   # check the rest
   if z not in list_of_points[:-100]:
      list_of_points.append(z)

   if some_condition:
      break

显然，这不是很优雅。使用第二个 FIFO 型容器 (collection.deque)，可以提供大致相同的加速。

Answer 1

你最好的选择可能是使用集合而不是列表，python 集合使用散列来插入项目，所以速度非常快。而且，您可以跳过检查项目是否已在列表中的步骤，只需尝试添加它即可，如果它已经在集合中，则不会添加，因为不允许重复。

窃取你的伪代码示例

set_of_points = {}

while True
   # get size of set
   a = len(set_of_points)

   # generate new point according to some rule
   z = generate() 

   # try to add z to the set
   set_of_points.add(z)

   b = len(set_of_points)

   # if a == b  it was not added, thus already existed in the set

   if some_condition:
      break

Answer 2

使用 set。这就是集合的用途。啊 - 你已经有了答案。所以我的其他评论：您的这部分代码似乎不正确：

   # check the rest
   if z not in list_of_points[100:]:
      list_of_points.append(z)

在上下文中，我相信你打算在那里写 list_of_points[:-100]。您已经检查了最后 100 个，但是，您现在跳过检查 first 100 个。

但更好的是，使用普通 list_of_points。随着列表变长，与复制 len(list_of_points) - 100 个元素

的成本相比，可能进行 100 次冗余比较的成本变得微不足道

Python 中巨大增长列表的高效逆序比较

Efficient reverse order comparison of huge growing list in Python

python

performance

numpy

unique

fifo