为什么这个 O(n^2) 解决方案比 O(n) 解决方案工作得更快?

Why does this O(n^2) solution work faster than the O(n) solution?

编辑:抱歉!看来我不小心复制粘贴了相同的代码。


问题:

Given a list of numbers from 0 to N-1, your job is to find the missing number. Your program should take in a list of integers and return a single integer (the missing number).

[0, 1, 3, 4, 5] 的输入应该 return 2.

我找到了两个解决方案,一个我认为在 O(n) 内,一个在 (On^2) 内。

O(n) 解

def alt_find_missing(input_list):
    input_sum = sum(input_list)
    range_sum = sum([i for i in range(min(input_list), max(input_list)+1)])
    return range_sum - input_sum

O(n^2) 解

def find_missing(input_list):
    input_set = set(input_list)
    for i in range(min(input_list), max(input_list)+1):
        if i not in input_set:
            return i

但是,在多个 timeit 测试中,O(n^2) 解的运行速度比 O(n) 快:

List with 4 elements, using for ... in set (x 1 000 000)
1.1550223514080038
List with 4 elements, using difference of sums (x 1 000 000)
1.391524411772641
List with 100 elements, using for ... in set (x 1 000 000)
8.43574248785071
List with 100 elements, using difference of sums (x 1 000 000)
8.94695660741872
List with 1000 elements, using for ... in set (x 100 000)
8.1138781071155
List with 1000 elements, using difference of sums (x 100 000)
8.383110816298519

这是为什么?

第二种算法不是O(n^2)。集合查找的复杂度为 O(1),遍历集合的复杂度为 O(n)。所以两种算法的区别是由于不同的常数因子。

这是一个简短的脚本,显示了两个函数的线性行为

import timeit
from functools import partial
from matplotlib import pyplot as plt

def alt_find_missing(input_list):
    input_sum = sum(input_list)
    range_sum = sum([i for i in range(min(input_list), max(input_list)+1)])
    return range_sum - input_sum


def find_missing(input_list):
    input_set = set(input_list)
    for i in range(min(input_list), max(input_list)+1):
        if i not in input_set:
            return i


t1=[]
t2=[]

rng=range(10000,1000000,10000)

for n in rng:
    func1=partial(find_missing,range(n))
    func2=partial(alt_find_missing,range(n))
    t1.append(timeit.timeit(func1,number=5))
    t2.append(timeit.timeit(func2,number=5)) 

plt.plot(rng,t1, label='Find missing')
plt.plot(rng,t2, label='Alt find missing')
plt.ylabel('Execution time')
plt.xlabel('Problem size')
plt.legend()
plt.show()

在我看来,结果看起来非常线性 :) 当然,在某些问题规模上会发生一些奇怪的事情,但不会使结果明显超出线性区域。

问候。