为什么我的代码在 pypy 中比默认的 python 解释器慢？

Question

给定多个玩家 n，我需要找到 H，所有元组的列表，其中每个元组是联盟的组合（玩家的，例如（1,2, 3) 是参与者 1、2 和 3 的联盟。（(1,2,3),(4,5),(6,)）是联盟的组合 - 也是元组）遵守此规则：每个玩家只出现且恰好出现一次（即只出现在一个联盟中）。 P.S。联盟的每一种组合在代码中称为布局。

一开始我写了一个片段，其中我计算了所有联盟的所有组合，并为每个组合检查了规则。问题是，对于 5-6 名玩家来说，联盟组合的数量已经太多了，以至于我的电脑坏了。为了避免大部分计算（所有可能的组合、循环和 ifs），我写了以下内容（我测试过，它等同于前面的代码片段）：

from itertools  import combinations, combinations_with_replacement, product, permutations

players = range(1,n+1)
coalitions = [[coal for coal in list(combinations(players,length))] for length in players]

H = [tuple(coalitions[0]),(coalitions[-1][0],)]
combs = [comb for length in xrange(2,n) for comb in combinations_with_replacement(players,length) if sum(comb) == n]
perms = list(permutations(players))
layouts = set(frozenset(frozenset(perm[i:i+x]) for (i,x) in zip([0]+[sum(comb[:y]) for y in xrange(1,len(comb))],comb)) for comb in combs for perm in perms)
H.extend(tuple(tuple(tuple(coal) for coal in layout) for layout in layouts))
print H

解释：假设 n = 3

首先我创建所有可能的联盟：

coalitions = [[(1,),(2,),(3,)],[(1,2),(1,3),(2,3)],[(1,2,3)]]

然后我用明显的组合初始化 H：每个玩家在他自己的联盟中，每个玩家在最大的联盟中。

H = [((1,),(2,),(3,)),((1,2,3),)]

然后我计算所有可能的布局形式：

combs = [(1,2)]   #(1,2) represents a layout in which there is 
                  #one 1-player coalition and one 2-player coalition.

我计算排列（perms）。最后，对于每个烫发和每个梳子，我都会计算不同的可能布局。我set结果（layouts）为了删除重复项并添加到H。

H = [((1,),(2,),(3,)),((1,2,3),),((1,2),(3,)),((1,3),(2,)),((2,3),(1,))]

比较如下：

python script.py

4: 0.000520944595337 秒
5: 0.0038321018219 秒
6: 0.0408189296722 秒
7: 0.431486845016 秒
8: 6.05224680901 秒
9: 76.4520540237 秒

pypy script.py

4: 0.00342392921448 秒
5: 0.0668039321899 秒
6: 0.311077833176 秒
7: 1.13124799728 秒
8: 11.5973010063 秒
9: 去 phut

为什么 pypy 这么慢？我应该改变什么？

Answer 1

这是 itertools.product 上的 Pypy 问题。

https://bitbucket.org/pypy/pypy/issues/1677/itertoolsproduct-slower-than-nested-fors

Note that our goal is to ensure that itertools is not massively slower than plain Python, but we don't really care about making it exactly as fast (or faster) as plain Python. As long as it's not massively slower, it's fine. (At least I don't agree with you about whether a) or b) is easier to read :-)

没有详细研究您的代码，看起来它大量使用了 itertools 组合、排列和乘积函数。在常规 CPython 中，它们是用编译后的 C 代码编写的，目的是使它们更快。 Pypy 没有实现 C 代码，因此这些函数速度较慢也就不足为奇了。

Answer 2

首先，我想指出您已经在学习 Bell numbers, which might ease the next part of your work, after you're done generating all the subsets. For example, it's easy to know how large each Bell set will be; OEIS has the sequence of Bell numbers。

我手写了循环来生成铃集；这是我的代码：

cache = {0: (), 1: ((set([1]),),)}

def bell(x):
    # Change these lines to alter memoization.
    if x in cache:
        return cache[x]
    previous = bell(x - 1)
    new = []
    for sets in previous:
        r = []
        for mark in range(len(sets)):
            l = [s | set([x]) if i == mark else s for i, s in enumerate(sets)]
            r.append(tuple(l))
        new.extend(r)
        new.append(sets + (set([x]),))
    cache[x] = tuple(new)
    return new

出于实用目的，我在这里包含了一些记忆。但是，通过注释掉一些代码并编写一些其他代码，您可以获得以下未记忆的版本，我将其用于基准测试：

def bell(x):
    if x == 0:
        return ()
    if x == 1:
        return ((set([1]),),)
    previous = bell(x - 1)
    new = []
    for sets in previous:
        r = []
        for mark in range(len(sets)):
            l = [s | set([x]) if i == mark else s for i, s in enumerate(sets)]
            r.append(tuple(l))
        new.extend(r)
        new.append(sets + (set([x]),))
    cache[x] = tuple(new)
    return new

我的数据基于我大部分工作都使用的一台有几年历史的 Thinkpad。大多数较小的案例都太快而无法可靠地测量（前几次每次试验甚至不到一毫秒），所以我的基准测试正在测试 bell(9) 到 bell(11)。

CPython 2.7.11 的基准测试，使用标准 timeit 模块：

$ python -mtimeit -s 'from derp import bell' 'bell(9)'
10 loops, best of 3: 31.5 msec per loop
$ python -mtimeit -s 'from derp import bell' 'bell(10)'
10 loops, best of 3: 176 msec per loop
$ python -mtimeit -s 'from derp import bell' 'bell(11)'
10 loops, best of 3: 1.07 sec per loop

在 PyPy 4.0.1 上，也使用 timeit:

$ pypy -mtimeit -s 'from derp import bell' 'bell(9)'
100 loops, best of 3: 14.3 msec per loop
$ pypy -mtimeit -s 'from derp import bell' 'bell(10)'
10 loops, best of 3: 90.8 msec per loop
$ pypy -mtimeit -s 'from derp import bell' 'bell(11)'
10 loops, best of 3: 675 msec per loop

因此，我得出的结论是 itertools 在您尝试在其预期习语之外使用它时速度不是很快。铃号的组合很有趣，但它们并非自然地来自我能找到的 itertools 小部件的任何简单组合。

针对您关于如何使其更快的原始查询的回应：只需打开代码即可。希望这对您有所帮助！

~ C.

为什么我的代码在 pypy 中比默认的 python 解释器慢？

Why is my code slower in pypy than the default python interpreter?

python

performance

pypy