使用 XOR 加速 Python2 嵌套循环

Question

of the question this is marked duplicate of is wrong and does not satisfy my needs.

我的代码旨在根据一系列数字计算哈希。

矩阵形式的结构更容易理解。如果我有 16 个数字，从 29 开始，结构将是：(start=29, length=4)

29, 30, 31, 32,
33, 34, 35, 36,
37, 38, 39, 40,
41, 42, 43, 44

给定的算法指定哈希将是粗体数字的异或：

29, 30, 31, 32, //,
33, 34, 35, //, 36,
37, 38, //, 39, 40,
41, //, 42, 43, 44

哈希=29^30^31^32^33^34^35^37^38^41=34

我的代码是：

def answer(start, length):
    val=0
    c=0
    for i in range(length):
        for j in range(length):
            if j < length-i:
                val^=start+c
            c+=1
    return val

计算像 answer(2000000000,10**4) 这样的大值所需的时间太多了。

限制条件：

Py2.7.6
只有标准库，除了 bz2、crypt、fcntl、mmap、pwd、pyexpat、select、signal、termios、thread、time、unicodedata、zipimport、zlib。
计算时间有限。

当前正在计算测试参数（我不知道）给我一个超时错误。

如何提高我的代码速度以获得更大的值？

Answer 1

看起来你可以替换内部循环，如果用：

for j in range(length - i) val^=start+c c+=1 c+=i 当我变大时，这应该可以节省一些时间

抱歉，我现在无法测试！

Answer 2

恐怕，根据您在 answer(2000000000,10**4) 中的输入，您将永远无法完成 "in time"。

你可以通过改进内部循环获得相当显着的加速，而不是每次都更新 c 变量并使用 xrange 而不是 range，像这样：

def answer(start, length):
    val=0
    c=0
    for i in range(length):
        for j in range(length):
            if j < length-i:
                val^=start+c
            c+=1
    return val


def answer_fast(start, length):
    val = 0
    c = 0
    for i in xrange(length):
        for j in xrange(length - i):
            if j < length - i:
                val ^= start + c + j
        c += length
    return val


# print answer(10, 20000)
print answer_fast(10, 20000)

探查器显示 answer_fast 大约快两倍：

> python -m cProfile script.py
366359392
        20004 function calls in 46.696 seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   46.696   46.696 script.py:1(<module>)
        1   44.357   44.357   46.696   46.696 script.py:1(answer)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
    20001    2.339    0.000    2.339    0.000 {range}

> python -m cProfile script.py
366359392
        3 function calls in 26.274 seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   26.274   26.274 script.py:1(<module>)
        1   26.274   26.274   26.274   26.274 script.py:12(answer_fast)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

但是如果你想要大幅加速（数量级），你应该考虑在 Cython 中重写你的函数。

这是它的 "cythonized" 版本：

def answer(int start, int length):
    cdef int val = 0, c = 0, i, j
    for i in xrange(length):
        for j in xrange(length - i):
            if j < length - i:
                val ^= start + c + j
        c += length
    return val

使用与上述相同的输入参数，用时不到 200 毫秒而不是 20 多秒，这是 100 倍的加速。

> ipython

In [1]: import pyximport; pyximport.install()
Out[1]: (None, <pyximport.pyximport.PyxImporter at 0x7f3fed983150>)

In [2]: import script2

In [3]: timeit script2.answer(10, 20000)
10 loops, best of 3: 188 ms per loop

根据你输入的参数，需要58ms：

In [5]: timeit script2.answer(2000000000,10**4)
10 loops, best of 3: 58.2 ms per loop

Answer 3

是已接受的答案中的错误：递减 l 需要在 [=32= 之前完成] 异或计算。这是一个修复后的版本，以及一个 assert 测试来验证它给出的结果与原始算法相同。

def f(a): return (a, 1, a + 1, 0)[a % 4] def getXor(a, b): return f(b) ^ f(a-1) def gen_nums(start, length): l = length ans = 0 while l > 0: l = l - 1 ans ^= getXor(start, start + l) start += length return ans def answer(start, length): c = val = 0 for i in xrange(length): for j in xrange(length - i): n = start + c + j #print '%d,' % n, val ^= n #print c += length return val for start in xrange(50): for length in xrange(100): a = answer(start, length) b = gen_nums(start, length) assert a == b, (start, length, a, b)

在 start 和 length 的范围内，gen_nums 比 answer 快大约 5 倍，但我们可以使它再次快两倍（即, 通过消除这些函数调用，速度大约是 answer) 的 10 倍：

def gen_nums(start, length): ans = 0 for l in xrange(length - 1, -1, -1): b = start + l ans ^= (b, 1, b + 1, 0)[b % 4] ^ (start - 1, 1, start, 0)[start % 4] start += length return ans

正如 Mirek Opoka 在评论中提到的，% 4 等同于 & 3，而且速度更快，因为按位运算比执行整数除法和丢弃商更快。所以我们可以把核心步骤换成

ans ^= (b, 1, b + 1, 0)[b & 3] ^ (start - 1, 1, start, 0)[start & 3]

使用 XOR 加速 Python2 嵌套循环

Speed up Python2 nested loops with XOR

python

performance

xor

nested-loops

python-2.7