python 中的高效列表操作

Question

我有一个很大的列表，经常需要找到一个满足相当复杂条件（不相等）的项目，即我被迫检查列表中的每个项目，直到找到一个。条件发生了变化，但有些项目比其他项目匹配得更多。所以我想每次找到一个匹配的项目就把它带到列表的前面，这样经常匹配的项目可以更快地找到。

是否有高效、pythonic 方式来做到这一点？

序列 ([]) 由数组支持，因此删除中间某处的项目并将其添加到数组意味着移动前一个项目。那是在 O(n) 时间内，不好。

在 C 中，您可以构建一个链表并在找到时自行移动该项目。在 Python 中有一个 deque，但是 afaik 你不能引用节点对象，也不能访问 .next 指针。

而且自制链表在Python中非常慢。（实际上它比不移动任何项目的普通线性搜索要慢。）

遗憾的是，dict 或 set 根据值相等查找项目，因此不符合我的问题。

作为示例，条件如下：

u, v, w = n.value   # list item
if v in g[u] and w in g[v] and u not in g[w]:
    ...

Answer 1

考虑使用 Pythonic 方法。正如 Ed Post 曾经说过的那样，"The determined Real Programmer can write FORTRAN programs in any language" —— 这概括了……你试图在 Python 中编写 C，但它对你来说效果不佳:-)

相反，考虑在 list 旁边放置一个辅助 dict 缓存——缓存找到项目的索引（仅需要在 "deep" 更改为列表的结构）。更简单和更快...

最好将 list 和 dict 放在一个小的 class:

中

class Seeker(object):
    def __init__(self, *a, **k):
        self.l = list(*a, **k)
        self.d = {}

    def find(self, value):
        where = self.d.get(value)
        if where is None:
            self.d[value] = where = self.l.find(value)
        return where

    def __setitem__(self, index, value):
        if value in self.d: del self.d[value]
        self.l[index] = value

    # and so on for other mutators that invalidate self.d; then,

    def __getattr__(self, name):
        # delegate everything else to the list
        return getattr(self.l, name)

您只需定义您实际需要使用的修改器——例如，如果您不想使用 insert、sort、__delitem__、&c，则无需定义那些，您可以将它们委托给列表。

添加：在 Python 3.2 或更高版本中，functools.lru_cache 实际上可以为您完成大部分工作——用它来装饰 find，您将获得更好的实现缓存，如果您愿意，可以限制缓存大小。要清除缓存，您需要在适当的位置调用 self.find.cache_clear()（我在上面使用 self.d = {} 的地方）——不幸的是，这个关键功能还没有（还！-）记录（志愿者更新文档与更新代码的文档不同...！-)...但是，相信我，它不会在您身上消失:-)。

补充：OP 编辑了 Q 以澄清他不是在 "value equality" 之后，而是一些更复杂的条件集，例如谓词，例如：

def good_for_g(g, n):
    # for some container `g` and item value `n`:
    u, v, w = n.value
    return v in g[u] and w in g[v] and u not in g[w]

据推测，将 "good" 项放在前面的愿望反过来又取决于它们的 "goodness" 是 "sticky"，即 g 几乎保持不变一会儿也一样。在这种情况下，可以使用谓词 one 作为特征提取和检查功能，它形成字典中的键——例如：

class FancySeeker(object):
    def __init__(self, *a, **k):
        self.l = list(*a, **k)
        self.d = {}

    def _find_in_list(self, predicate):
        for i, n in enumerate(self.l):
            if predicate(n):
                return i
        return -1

    def find(self, predicate):
        where = self.d.get(predicate)
        if where is None:
            where = self._find_in_list(predicate)
            self.d[predicate] = where
        return where

等等。

所以剩下的难点就是把predicate以适合有效索引的形式放到一个dict中。如果 predicate 只是一个函数，没问题。但是，如果 predicate 是一个带参数的函数，例如由 functools.partial 形成或作为某个实例的绑定方法，则需要进一步 processing/wrapping 才能使索引工作。

例如，使用相同的绑定参数和函数对 functools.partial 的两次调用不会 return 相等的对象 - 而是要检查 .args 和 .func 的 returned 对象以确保，可以说，对于任何给定的 (func, args) 对，"singleton" 是 returned。

此外，如果某些绑定参数是可变的，则需要使用它们的 id 代替它们的 hash（否则原始 functools.partial 对象将不可散列).对于绑定方法，它变得更加毛茸茸，尽管它们可以类似地包装成例如可散列的 "equality adjusted" Predicate class.

最后，如果这些旋转证明太麻烦，而您确实想要快速实现链表，请查看 https://pypi.python.org/pypi/llist/0.4——它是 [= 的单链表和双向链表的 C 代码实现74=]（对于每种类型，它实现三种类型：列表本身、列表节点和列表的迭代器）。

Answer 2

您可以使用 deque.rotate 完全按照您的意愿行事。

from collections import deque

class Collection:
    "Linked List collection that moves searched for items to the front of the collection"

    def __init__(self, seq):
        self._deque = deque(seq)

    def __contains__(self, target):
        for i, item in enumerate(self._deque):
            if item == target:
                self._deque.rotate(i)
                self._deque.popleft()
                self._deque.rotate(-i+1)
                self._deque.appendleft(item)
                return True
        return False

    def __str__(self):
        return "Collection({})".format(str(self._deque))

c = Collection(range(10))
print(c)
print("5 in d:", 5 in c)
print(c)

给出以下输出：

Collection(deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))
5 in c: True
Collection(deque([5, 0, 1, 2, 3, 4, 6, 7, 8, 9]))

python 中的高效列表操作

Efficient list manipulation in python

python

reference

linked-list

list