哪个更快?检查 Python 列表中是否有内容? IE。会员资格 vs non-membership
Which is faster? Checking if something is in a Python list or not? I.e. membership vs non-membership
对于那些比我了解更多计算机科学的人来说,这可能是一个菜鸟问题或显而易见的问题。也许这就是为什么我在搜索后无法从 Google 或 SO 中找到任何内容的原因。也许我没有使用正确的词汇。
标题说明了一切。如果我知道 x
大多数时候在 my_list
中,下面哪个更快?
if x in my_list:
func1(x)
else:
func2(x)
或
if x not in my_list:
func2(x)
else:
func1(x)
列表的大小重要吗?例如。十个元素 vs 10,000 个元素?对于我的特殊情况,my_list
由字符串和整数组成,但是有人知道其他考虑因素是否适用于更复杂的类型,例如字典吗?
谢谢。
检查元素是否在列表中或者元素是否不在调用相同操作的列表中x in my_list
,所以应该没有任何区别。
Does the size of the list matter?
检查元素是否在列表中是一个 O(N) 操作,这意味着大小确实很重要,大致成比例。
如果你需要做很多检查,你可能想查看 set
,检查一个元素是否在 set
中是 O(1),这意味着检查时间确实随着 set
的大小增加,变化不大。
应该没有明显的性能差异。您最好编写使您的代码更具可读性的任何一个。任何一个都是 O(n) 复杂度,并且主要取决于元素在列表中的位置。此外,您还应该避免过早优化,这对大多数用例来说都无关紧要,而当它发生时,您通常最好还是使用其他数据结构。
如果您想以更快的速度进行查找,请使用字典,它们的复杂度可能为 O(1)。
详情请参考https://wiki.python.org/moin/TimeComplexity .
软件实现中的一个理想特性是 coupling。你的实现不应该由你的 Python 解释器测试列表成员的方式来定义,因为这是一个高层次的耦合。可能是实施发生了变化,不再是更快的方式。
在这种情况下,我们应该关心的是,测试列表中的成员资格与列表的大小呈线性关系。如果需要更快的成员资格测试,您可以使用集合。
Python 包含一个模块和函数 timeit
,可以告诉您执行一段代码需要多长时间。该片段必须是单个语句,这样就不会像 if
那样直接对复合语句进行计时,但我们可以将您的语句包装在一个函数中并为函数调用计时。
比调用 timeit.timeit()
更容易的是使用 jupyter notebook 并在行首使用魔法 %timeit
魔法语句。
这证明了长列表或短列表,成功或失败,您询问的两种方式,检查 in alist
或 not in alist
,给出的时间在测量的可变性内是相同的。
import random
# set a seed so results will be repeatable
random.seed(456789)
# a 10K long list of junk with no value greater than 100
my_list = [random.randint(-100, 100) for i in range(10000)]
def func1(x):
# included just so we get a function call
return True
def func2(x):
# included just so we get a function call
return False
def way1(x):
if x in my_list:
result = func1(x)
else:
result = func2(x)
return result
def way2(x):
if x not in my_list:
result = func2(x)
else:
result = func1(x)
return result
%timeit way1(101) # failure with large list
The slowest run took 8.29 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 207 µs per loop
%timeit way1(0) # success with large list
The slowest run took 7.34 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.04 µs per loop
my_list.index(0)
186
%timeit way2(101) # failure with large list
The slowest run took 12.44 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 208 µs per loop
%timeit way2(0) # success with large list
The slowest run took 7.39 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.01 µs per loop
my_list = my_list[:10] # now make it a short list
print(my_list[-1]) # what is the last value
-37
# Run the same stuff again against the smaller list, showing that it is
# much faster but still way1 and way2 have no significant differences
%timeit way1(101) # failure with small list
%timeit way1(-37) # success with small list
%timeit way2(101) # failure with small list
%timeit way2(-37) # success with small list
The slowest run took 18.75 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 417 ns per loop
The slowest run took 13.00 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 403 ns per loop
The slowest run took 5.08 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 427 ns per loop
The slowest run took 4.86 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 386 ns per loop
# run the same again to get an idea of variability between runs so we can
# be sure that way1 and way2 have no significant differences
%timeit way1(101) # failure with small list
%timeit way1(-37) # success with small list
%timeit way2(101) # failure with small list
%timeit way2(-37) # success with small list
The slowest run took 8.57 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 406 ns per loop
The slowest run took 4.79 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 412 ns per loop
The slowest run took 4.90 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 412 ns per loop
The slowest run took 4.56 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 398 ns per loop
对于那些比我了解更多计算机科学的人来说,这可能是一个菜鸟问题或显而易见的问题。也许这就是为什么我在搜索后无法从 Google 或 SO 中找到任何内容的原因。也许我没有使用正确的词汇。
标题说明了一切。如果我知道 x
大多数时候在 my_list
中,下面哪个更快?
if x in my_list:
func1(x)
else:
func2(x)
或
if x not in my_list:
func2(x)
else:
func1(x)
列表的大小重要吗?例如。十个元素 vs 10,000 个元素?对于我的特殊情况,my_list
由字符串和整数组成,但是有人知道其他考虑因素是否适用于更复杂的类型,例如字典吗?
谢谢。
检查元素是否在列表中或者元素是否不在调用相同操作的列表中x in my_list
,所以应该没有任何区别。
Does the size of the list matter?
检查元素是否在列表中是一个 O(N) 操作,这意味着大小确实很重要,大致成比例。
如果你需要做很多检查,你可能想查看 set
,检查一个元素是否在 set
中是 O(1),这意味着检查时间确实随着 set
的大小增加,变化不大。
应该没有明显的性能差异。您最好编写使您的代码更具可读性的任何一个。任何一个都是 O(n) 复杂度,并且主要取决于元素在列表中的位置。此外,您还应该避免过早优化,这对大多数用例来说都无关紧要,而当它发生时,您通常最好还是使用其他数据结构。
如果您想以更快的速度进行查找,请使用字典,它们的复杂度可能为 O(1)。 详情请参考https://wiki.python.org/moin/TimeComplexity .
软件实现中的一个理想特性是 coupling。你的实现不应该由你的 Python 解释器测试列表成员的方式来定义,因为这是一个高层次的耦合。可能是实施发生了变化,不再是更快的方式。
在这种情况下,我们应该关心的是,测试列表中的成员资格与列表的大小呈线性关系。如果需要更快的成员资格测试,您可以使用集合。
Python 包含一个模块和函数 timeit
,可以告诉您执行一段代码需要多长时间。该片段必须是单个语句,这样就不会像 if
那样直接对复合语句进行计时,但我们可以将您的语句包装在一个函数中并为函数调用计时。
比调用 timeit.timeit()
更容易的是使用 jupyter notebook 并在行首使用魔法 %timeit
魔法语句。
这证明了长列表或短列表,成功或失败,您询问的两种方式,检查 in alist
或 not in alist
,给出的时间在测量的可变性内是相同的。
import random
# set a seed so results will be repeatable
random.seed(456789)
# a 10K long list of junk with no value greater than 100
my_list = [random.randint(-100, 100) for i in range(10000)]
def func1(x):
# included just so we get a function call
return True
def func2(x):
# included just so we get a function call
return False
def way1(x):
if x in my_list:
result = func1(x)
else:
result = func2(x)
return result
def way2(x):
if x not in my_list:
result = func2(x)
else:
result = func1(x)
return result
%timeit way1(101) # failure with large list
The slowest run took 8.29 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 207 µs per loop
%timeit way1(0) # success with large list
The slowest run took 7.34 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.04 µs per loop
my_list.index(0)
186
%timeit way2(101) # failure with large list
The slowest run took 12.44 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 208 µs per loop
%timeit way2(0) # success with large list
The slowest run took 7.39 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.01 µs per loop
my_list = my_list[:10] # now make it a short list
print(my_list[-1]) # what is the last value
-37
# Run the same stuff again against the smaller list, showing that it is
# much faster but still way1 and way2 have no significant differences
%timeit way1(101) # failure with small list
%timeit way1(-37) # success with small list
%timeit way2(101) # failure with small list
%timeit way2(-37) # success with small list
The slowest run took 18.75 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 417 ns per loop
The slowest run took 13.00 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 403 ns per loop
The slowest run took 5.08 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 427 ns per loop
The slowest run took 4.86 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 386 ns per loop
# run the same again to get an idea of variability between runs so we can
# be sure that way1 and way2 have no significant differences
%timeit way1(101) # failure with small list
%timeit way1(-37) # success with small list
%timeit way2(101) # failure with small list
%timeit way2(-37) # success with small list
The slowest run took 8.57 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 406 ns per loop
The slowest run took 4.79 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 412 ns per loop
The slowest run took 4.90 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 412 ns per loop
The slowest run took 4.56 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 398 ns per loop