基于 ID 列表有效计算 XOR(^) 校验和的方法

Question

在谷歌搜索有关 Python 列表理解的信息时，我收到了一个 google foobar 挑战，过去几天我一直在慢慢研究它以获得乐趣。最新挑战：

有效地要求生成一个 ID 列表，忽略每一行中递增的数字，直到剩下一个 ID。然后你应该对 ID 进行 XOR(^) 以产生校验和。我创建了一个输出正确答案的工作程序，但是它的效率不足以在分配的时间内通过所有测试用例（通过 6/10）。 50,000 的长度应该在 20 秒内产生结果，但它需要 320。

有人可以引导我朝着正确的方向前进，但请不要为我做这件事，我很乐意接受这个挑战。也许我可以实现一种数据结构或算法来加快计算时间？

代码背后的逻辑：

首先取起始ID和长度
生成一个 ID 列表，忽略每一行中越来越多的 ID，从忽略第一行的 0 开始。
使用 for 循环对 IDS 列表中的所有数字进行异或运算
答案以整数形式返回

import timeit
def answer(start,length):
    x = start
    lengthmodified = length
    answerlist = []
    for i in range (0,lengthmodified): #Outter for loop runs an amount of times equal to the variable "length".
        prestringresult = 0
        templist = []
        for y in range (x,x + length): #Fills list with ids for new line
            templist.append(y)
        for d in range (0,lengthmodified): #Ignores an id from each line, increasing by one with each line, and starting with 0 for the first
            answerlist.append(templist[d])
        lengthmodified -= 1
        x += length    
        for n in answerlist: #XORs all of the numbers in the list via a loop and saves to prestringresult
            prestringresult ^= n
        stringresult = str(prestringresult) 
        answerlist = [] #Emptys list
        answerlist.append(int(stringresult)) #Adds the result of XORing all of the numbers in the list to the answer list
    #print(answerlist[0]) #Print statement allows value that's being returned to be checked, just uncomment it
    return (answerlist[0]) #Returns Answer



#start = timeit.default_timer()
answer(17,4)
#stop = timeit.default_timer()
#print (stop - start)

Answer 1

templist 和 answerlist 都不是真正需要的。让我们检查一下您的代码，看看如何消除它们。

首先，让我们把templist的初始化写成一行。这个：

templist = []
for y in range (x,x + length):
    templist.append(y)

变成这样：

templist = list(range(x, x + length))

然后让我们对answerlist做同样的事情。这个：

for d in range (0,lengthmodified):
    answerlist.append(templist[d])

变成这样：

answerlist.extend(templist[:lengthmodified])

下面我们就来看看后面是怎么使用的吧。如果我们暂时忽略 lengthmodified -= 1 和 x += length，我们有：

templist = list(range(x, x + length))
answerlist.extend(templist[:lengthmodified])

for n in answerlist:
    prestringresult ^= n

answerlist = []

与其扩展 answerlist、遍历它然后清除它，不如遍历 templist.

会更快

templist = list(range(x, x + length))

for n in templist[:lengthmodified]:
    prestringresult ^= n

现在也不需要 templist，所以我们也跳过构建它。

for n in range(x, x + lengthmodified):
    prestringresult ^= n

templist和answerlist都不见了。

这里唯一缺少的部分是 answerlist.append(int(stringresult)) 重新工作。我会把它留给你解决。

总的来说，这里的教训是尽可能避免显式 for 循环。编写大量 for 遍历容器的循环是一种 C 思维方式。在 Python 中，通常有多种方法可以同时浏览所有集合。这样做可以让您利用该语言的快速内置操作。

作为奖励，惯用语 Python 也更容易阅读。

Answer 2

您可能需要一种不同的方法，而不仅仅是像 John 那样的小改进。我刚刚写了一个解决方案，可以在我的 PC 上在大约 2 秒内完成 answer(0, 50000)。我仍然逐行进行，但不是对行范围内的所有数字进行异或运算，而是逐位进行。该行中有多少个数字设置了 1 位？^[*] 奇数个数字？然后我翻转答案的 1 位。然后与 2 位、4 位、8 位等相同，直到 2³⁰ 位。所以对于每一行，它只是 31 个小计算（而不是实际对数万个数字进行异或）。

[*] 只需从范围的 start/stop 即可在常数时间内快速计算。

编辑： 由于您要求提供另一个提示，这里介绍如何计算在某个范围 (a, b) 中设置 1 位的频率。计算它在范围 (0, a) 中设置的频率，并从它在范围 (0, b) 中设置的频率中减去该值。如果范围从零开始会更容易。在某个范围（0，c）中设置 1 位的频率是多少？简单：c//2 次。那么在某个范围（a，b）中设置 1 位的频率是多少？只需 b//2 - a//2 次。高位类似，只是稍微复杂一点。

编辑 2： 哦等等，我刚想起来...有一种简单的方法可以计算某个范围 (a, b) 内所有数字的异或。再次将工作分成 range(0, a) 和 range(0, b)。某个范围 (0, c) 中所有数字的异或很容易，因为有一个很好的模式（如果你对从 0 到比方说 30 的所有 c 都这样做，你就会看到它）。使用它，我现在在大约 0.04 秒.

内解决了 answer(0, 50000)

Answer 3

我可以在不使用列表的情况下获得一点改进，但它仍然会在大数字上失败。嵌套循环会降低速度。我认为您需要遵循 Pochmann 逻辑，因为蛮力很少是解决此类问题的方法。

Answer 4

在这个问题中，大多数人都会得到 Time limit exceeded。我做到了！这道题可以这样总结："Find the XOR of all the numbers that lies between certain range in constant time."是的，常数时间！

所以从3-6，答案应该是O(1)时间内的3^4^5^6 = 4。

解决方法： XOR 本质上是关联的。所以A^B^C可以写成B^A^C。此外，我们知道 XOR 的意思是：“与”相同的位结果为真，即 1，不同的位结果为 2。

从这两个性质我们可以写出： 3-6的所有数字之间的异或可以写成：

3^4^5^6 = (0^1^2)^(0^1^2) ^ (3^4^5^6)
        = (0^1^2^3^4^5^6) ^ (0^1^2) (this comes from the associative nature of xor)
        = XOR betn all the numbers from (0-6) ^ XOR betn all the numbers from (0-2)...eq(1)

所以现在如果我们能在常数时间内对从0到某个整数的所有数字进行异或运算，我们就会得到答案。

幸运的是，我们有一个模式：

示例见此：

(0-1): 0 ^ 1 = 1 (1)
(0-2): 0 ^ 1 ^ 2 = 3 (2+1)
(0-3): 0 ^ 1 ^ 2 ^ 3 = 0 (0)
(0-4): 0 ^ 1 ^ 2 ^ 3 ^ 4 = 4 (4)

(0-5): 0 ^ 1 ^ 2 ^ 3 ^ 4 ^ 5 = 1 (1)
(0-6): 0 ^ 1 ^ 2 ^ 3 ^ 4 ^ 5 ^ 6 = 7 (6+1)
(0-7): 0 ^ 1 ^ 2 ^ 3 ^ 4 ^ 5 ^ 6 ^  7 = 0 (0)
(0-8): 0 ^ 1 ^ 2 ^ 3 ^ 4 ^ 5 ^ 6 ^ 7 ^ 8 = 8 (8)


So the pattern for finding the xor between all the integers between 0 to n is:
if n%4 == 1 then, answer = 1
if n%4 == 2 then, answer = n+1
if n%4 == 3 then, answer = 0
if n%4 == 0 then answer = n 

Therefore, XOR(0-6) becomes 7 (since 6%4 ==2) and XOR(0-2) becomes 3 (since 2%4 ==2)

Therefore, the eq(1) now becomes:
3^4^5^6 = 7 ^ 3 = 4

现在问题很简单，我们大多数人都因为超时错误而卡住，因为我们尝试在每个循环中进行异或，如果 input/iteration 的数量增加，这将是巨大的。这是我在 python 中的工作解决方案，所有测试用例都通过 google:

#Main Program
def answer(start, length):
    checkSum = 0
    for l in range(length, 0, -1):
        checkSum = checkSum ^ (getXor(start + l-1) ^ getXor(start-1))
        start = start + length
    return checkSum

def getXor(x):
    result = [x, 1, x+1, 0]
    return result[x % 4]

基于 ID 列表有效计算 XOR(^) 校验和的方法

Way to efficiently calculate XOR(^) checksum based on a list of IDs

python

checksum

xor