哪个更快?检查 Python 列表中是否有内容? IE。会员资格 vs non-membership

Which is faster? Checking if something is in a Python list or not? I.e. membership vs non-membership

对于那些比我了解更多计算机科学的人来说,这可能是一个菜鸟问题或显而易见的问题。也许这就是为什么我在搜索后无法从 Google 或 SO 中找到任何内容的原因。也许我没有使用正确的词汇。

标题说明了一切。如果我知道 x 大多数时候在 my_list 中,下面哪个更快?

if x in my_list:
    func1(x)
else:
    func2(x)

if x not in my_list:
    func2(x)
else:
    func1(x)

列表的大小重要吗?例如。十个元素 vs 10,000 个元素?对于我的特殊情况,my_list 由字符串和整数组成,但是有人知道其他考虑因素是否适用于更复杂的类型,例如字典吗?

谢谢。

检查元素是否在列表中或者元素是否不在调用相同操作的列表中x in my_list,所以应该没有任何区别。

Does the size of the list matter?

检查元素是否在列表中是一个 O(N) 操作,这意味着大小确实很重要,大致成比例。

如果你需要做很多检查,你可能想查看 set,检查一个元素是否在 set 中是 O(1),这意味着检查时间确实随着 set 的大小增加,变化不大。

应该没有明显的性能差异。您最好编写使您的代码更具可读性的任何一个。任何一个都是 O(n) 复杂度,并且主要取决于元素在列表中的位置。此外,您还应该避免过早优化,这对大多数用例来说都无关紧要,而当它发生时,您通常最好还是使用其他数据结构。

如果您想以更快的速度进行查找,请使用字典,它们的复杂度可能为 O(1)。 详情请参考https://wiki.python.org/moin/TimeComplexity .

软件实现中的一个理想特性是 coupling。你的实现不应该由你的 Python 解释器测试列表成员的方式来定义,因为这是一个高层次的耦合。可能是实施发生了变化,不再是更快的方式。

在这种情况下,我们应该关心的是,测试列表中的成员资格与列表的大小呈线性关系。如果需要更快的成员资格测试,您可以使用集合。

Python 包含一个模块和函数 timeit,可以告诉您执行一段代码需要多长时间。该片段必须是单个语句,这样就不会像 if 那样直接对复合语句进行计时,但我们可以将您的语句包装在一个函数中并为函数调用计时。

比调用 timeit.timeit() 更容易的是使用 jupyter notebook 并在行首使用魔法 %timeit 魔法语句。

这证明了长列表或短列表,成功或失败,您询问的两种方式,检查 in alistnot in alist,给出的时间在测量的可变性内是相同的。

import random

# set a seed so results will be repeatable
random.seed(456789)

# a 10K long list of junk with no value greater than 100
my_list = [random.randint(-100, 100) for i in range(10000)] 

def func1(x):
    # included just so we get a function call
    return True

def func2(x):
    # included just so we get a function call
    return False

def way1(x):
    if x in my_list:
        result = func1(x)
    else:
        result = func2(x)
    return result

def way2(x):
    if x not in my_list:
        result = func2(x)
    else:
        result = func1(x)
    return result

%timeit way1(101) # failure with large list

The slowest run took 8.29 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 207 µs per loop

%timeit way1(0) # success with large list

The slowest run took 7.34 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.04 µs per loop

my_list.index(0)

186

%timeit way2(101) # failure with large list

The slowest run took 12.44 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 208 µs per loop

%timeit way2(0) # success with large list

The slowest run took 7.39 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.01 µs per loop

my_list = my_list[:10] # now make it a short list
print(my_list[-1]) # what is the last value

-37

# Run the same stuff again against the smaller list, showing that it is
# much faster but still way1 and way2 have no significant differences
%timeit way1(101) # failure with small list
%timeit way1(-37) # success with small list
%timeit way2(101) # failure with small list
%timeit way2(-37) # success with small list

The slowest run took 18.75 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 417 ns per loop
The slowest run took 13.00 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 403 ns per loop
The slowest run took 5.08 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 427 ns per loop
The slowest run took 4.86 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 386 ns per loop

# run the same again to get an idea of variability between runs so we can
# be sure that way1 and way2 have no significant differences
%timeit way1(101) # failure with small list
%timeit way1(-37) # success with small list
%timeit way2(101) # failure with small list
%timeit way2(-37) # success with small list

The slowest run took 8.57 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 406 ns per loop
The slowest run took 4.79 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 412 ns per loop
The slowest run took 4.90 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 412 ns per loop
The slowest run took 4.56 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 398 ns per loop