为什么这个 o(n) 三向集不相交算法比 o(n^3) 版本慢？

Question

O(n)因为list转set是O(n)时间，求交集是O(n)时间，len是O(n)

def disjoint3c(A, B, C):
    """Return True if there is no element common to all three lists."""
    return len(set(A) & set(B) & set(C)) == 0

或者类似的，显然应该是O(N)

def set_disjoint_medium (a, b, c):
    a, b, c = set(a), set(b), set(c)
    for elem in a:
        if elem in b and elem in c:
            return False
    return True

然而这个 O(n^3) 代码：

def set_disjoint_slowest (a, b, c):
    for e1 in a:
        for e2 in b:
            for e3 in c:
                if e1 == e2 == e3:
                    return False
    return True

运行速度更快

看看算法一是 n^3，算法三是 O(n) 集代码...算法二实际上是 n^2，我们通过在第三个循环之前检查不相交来优化算法一开始

Size Input (n):  10000

Algorithm One: 0.014993906021118164

Algorithm Two: 0.013481855392456055

Algorithm Three: 0.01955580711364746

Size Input (n):  100000

Algorithm One: 0.15916991233825684

Algorithm Two: 0.1279449462890625

Algorithm Three: 0.18677806854248047

Size Input (n):  1000000

Algorithm One: 1.581618070602417

Algorithm Two: 1.146049976348877

Algorithm Three: 1.8179030418395996

Answer 1

这些评论对 Big-Oh 符号进行了澄清。所以我将从测试代码开始。

这是我用来测试代码速度的设置。

import random

# Collapsed these because already known
def disjoint3c(A, B, C):
def set_disjoint_medium (a, b, c):
def set_disjoint_slowest (a, b, c):

a = [random.randrange(100) for i in xrange(10000)]
b = [random.randrange(100) for i in xrange(10000)]
c = [random.randrange(100) for i in xrange(10000)]

# Ran timeit.
# Results with timeit module.
1-) 0.00635750419422
2-) 0.0061145967287
3-) 0.0487953200969

现在来看结果，如您所见，O(n^3) 解决方案比其他解决方案慢运行s 8 倍。但这对于这样的算法来说仍然很快（在您的测试中甚至更快）。 为什么会这样？

因为您使用的是中等和最慢的解决方案，一旦检测到公共元素，就会完成代码的执行。所以没有实现代码的全部复杂性。一旦找到答案，它就会崩溃。为什么最慢的解决方案运行几乎和测试中的其他解决方案一样快？可能是因为它找到的答案更接近列表的开头。

要对此进行测试，您可以像这样创建列表。自己试试看。

a = range(1000) b = range(1000, 2000) c = range(2000, 3000)

现在时间之间的真正差异将很明显，因为最慢的解决方案必须运行直到它完成所有迭代，因为没有公共元素。

所以这是最坏情况和最佳情况性能的情况。

不是问题编辑的一部分： 所以，如果你想保持发现早期常见事件的速度，又怎样？运行ces，但也不要想增加复杂性。我做了一个粗略的解决方案，也许更有经验的用户可以建议更快的代码。

def mysol(a, b, c): store = [set(), set(), set()] # zip_longest for Python3, not izip_longest. for i, j, k in itertools.izip_longest(a, b, c): if i: store[0].add(i) if j: store[1].add(j) if k: store[2].add(k) if (i in store[1] and i in store[2]) or (j in store[0] and i in store[2]) or (k in store[0] and i in store[1]): return False return True

这段代码基本上要做的是，避免在开始时将所有列表转换为集合。相反，同时遍历所有列表，将元素添加到集合中，检查常见的 occu运行ces。所以现在，你保持寻找早期解决方案的速度，但对于我展示的最坏情况来说它仍然很慢。

对于速度，在最坏的情况下，这运行比您的前两个解决方案慢 3-4 倍。但是运行比运行domized 列表中的那些解决方案快 4-10 倍。

注意：您在三个列表中找到所有公共元素（在第一个解决方案中）这一事实无疑意味着理论上有更快的解决方案。因为你只需要知道 if 即使有一个公共元素，这些知识就足够了。

Answer 2

O 符号忽略所有常数因子。所以它只会回答 infinite 数据集。对于任何有限集，这只是一个经验法则。

对于 Python 和 R 等解释型语言，常数因子可能非常大。他们需要创建和收集许多对象，这都是 O(1) 但不是免费的。因此，不幸的是，几乎相同代码的性能差异达到 100 倍是很常见的。

其次，第一个算法计算所有个公共元素，而其他算法在第一个上失败。如果您进行基准测试 algX(a,a,a)（是的，所有三组都相同）那么它将比其他组做更多的工作！

看到基于排序的 O(n log n) 算法非常具有竞争力（因为排序通常优化得非常好），我不会感到惊讶。对于整数，我会使用 numpy 数组，并且通过尽可能避免 python 解释器，您可以获得非常快的速度。虽然 numpys in1d 和 intersect 可能会给你一个 O(n^2) 或 O(n^3) 算法，但只要你的集合通常不相交，它们最终可能会更快。

另请注意，在您的情况下，集合不一定成对不相交...algX(set(),a,a)==True。

为什么这个 o(n) 三向集不相交算法比 o(n^3) 版本慢？

why is this o(n) three-way set disjointness algorithm slower than then o(n^3) version?

python

algorithm

complexity-theory