找到给出超过 65 个素数的最低 collatz 序列

Question

我有一个任务，需要在 Python 中找到包含超过 65 个素数的最低 Collatz 序列。

例如，19 的 Collatz 序列是：

19, 58, 29, 88, 44, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1

这个序列包含7个质数。

我还需要使用记忆功能，因此不必运行和 "year" 就能找到它。我找到了用于记忆 Collatz 序列的代码，但是当我只需要素数时，我不知道如何让它工作。

这是我找到的 Collatz 记忆代码：

lookup = {}
def countTerms(n):
    if n not in lookup:
        if n == 1:
            lookup[n] = 1
        elif not n % 2:
            lookup[n] = countTerms(n / 2)[0] + 1
        else:
            lookup[n] = countTerms(n*3 + 1)[0] + 1

    return lookup[n], n

这是我的 prime 测试仪：

def is_prime(a):
    for i in xrange(2,a):
        if a%i==0:
            #print a, " is not a prime number"
            return False
    if a==1:
        return False
    else:
        return True

Answer 1

您现有的代码缩进不正确。我假设此任务是一项家庭作业，所以我不会 post 一个完整的工作解决方案，但我会给你一些有用的片段。

首先，这里有一个稍微更有效的素性测试器。它不是测试所有小于 a 的数字是否都是 a 的因数，而是测试 a.

的平方根

def is_prime(a):
    for i in xrange(2, int(1 + a ** 0.5)):
        if a % i == 0:
            return False
    return True

请注意，此函数 returns True 用于 a = 1。没关系，因为你不需要测试 1: 你可以将它预加载到 lookup dict:

lookup = {1:0}

您的 countTerms 函数需要稍微修改，以便在当前 n 为质数时，它只将 lookup 计数加一。在Python中，False的数值为0，True的数值为1。这里很方便：

def count_prime_terms(n):
    ''' Count the number of primes terms in a Collatz sequence '''
    if n not in lookup:
        if n % 2:
            next_n = n * 3 + 1
        else:
            next_n = n // 2

        lookup[n] = count_prime_terms(next_n) + is_prime(n)
    return lookup[n]

我已经将函数名称更改为更 Pythonic.

FWIW，包含 65 个或更多素数的第一个 Collatz 序列实际上包含 67 个素数。它的种子数超过 180 万，在检查直到该种子的所有序列时需要进行素性测试的最高数字是 151629574372。完成时，lookup 字典包含 3920492 个条目。

为了回应 James Mills 关于递归的评论，我编写了一个非递归版本，并且为了更容易看出迭代和递归版本都产生相同的结果，我 post 一个完整的工作程序。我在上面说过我不打算这样做，但我认为现在应该可以这样做，因为 spørreren 已经使用我在原始答案中提供的信息编写了他们的程序。

我完全同意避免递归是好的，除非递归适用于问题域（例如，树遍历）。 Python 不鼓励递归 - 它不能优化尾调用递归并且它强加了递归深度限制（尽管如果需要可以修改该限制）。

这个 Collatz 序列素数计数算法自然是递归陈述的，但迭代地做起来并不难 - 我们只需要一个列表来临时保存序列，同时确定其所有成员的素数。的确，这个列表占用了 RAM，但它（可能）比递归版本所需的堆栈帧要求更有效 space-wise。

递归版本在解决OP中的问题时达到了343的递归深度。这完全在默认限制内，但仍然不好，如果您想搜索包含更多素数的序列，您将达到该限制。

迭代和递归版本运行速度大致相同（至少，它们在我的机器上这样做）。要解决 OP 中所述的问题，他们都需要不到 2 分钟的时间。这比我原来的解决方案快得多，主要是由于素数测试的优化。

基本的 Collatz 序列生成步骤已经需要确定数字是奇数还是偶数。显然，如果我们已经知道一个数是偶数，那么就没有必要测试它是否是素数。 :) 而且我们还可以消除 is_prime 函数中偶数因子的测试。我们可以通过简单地将 2 的结果加载到 lookup 缓存中来处理 2 是素数的事实。

相关说明，在搜索包含给定数量的素数的第一个序列时，我们不需要测试任何以偶数开始的序列。偶数（2 除外）不会增加序列的质数计数，并且由于此类序列中的第一个奇数将小于我们当前的数字，因此其结果将已经在 lookup 缓存中，假设我们从 3 开始搜索。如果我们不从 3 开始搜索，我们只需要确保我们的起始种子足够低，这样我们就不会意外错过包含所需素数的第一个序列。采用此策略不仅减少了所需时间，还减少了查找缓存中的条目数。

#!/usr/bin/env python

''' Find the 1st Collatz sequence containing a given number of prime terms

    From 

    Written by PM 2Ring 2015.04.29

    [Seed == 1805311, prime count == 67]
'''

import sys

def is_prime(a):
    ''' Test if odd `a` >= 3 is prime '''
    for i in xrange(3, int(1 + a ** 0.5), 2):
        if not a % i:
            return 0
    return 1


#Track the highest number generated so far; use a list
# so we don't have to declare it as global...
hi = [2]

#Cache for sequence prime counts. The key is the sequence seed,
# the value is the number of primes in that sequence.
lookup = {1:0, 2:1}

def count_prime_terms_iterative(n):
    ''' Count the number of primes terms in a Collatz sequence 
        Iterative version '''
    seq = []
    while n not in lookup:
        if n > hi[0]:
            hi[0] = n

        if n % 2:
            seq.append((n, is_prime(n)))
            n = n * 3 + 1
        else:
            seq.append((n, 0))
            n = n // 2

    count = lookup[n]
    for n, isprime in reversed(seq):
        count += isprime
        lookup[n] = count

    return count

def count_prime_terms_recursive(n):
    ''' Count the number of primes terms in a Collatz sequence
        Recursive version '''
    if n not in lookup:
        if n > hi[0]:
            hi[0] = n

        if n % 2:
            next_n = n * 3 + 1
            isprime = is_prime(n)
        else:
            next_n = n // 2
            isprime = 0
        lookup[n] = count_prime_terms(next_n) + isprime

    return lookup[n]


def find_seed(numprimes, start):
    ''' Find the seed of the 1st Collatz sequence containing
        `numprimes` primes, starting from odd seed `start` '''
    i = start
    mcount = 0

    print 'seed, prime count, highest term, dict size'
    while mcount < numprimes:
        count = count_prime_terms(i)
        if count > mcount:
            mcount = count
            print i, count, hi[0], len(lookup)
        i += 2


#count_prime_terms = count_prime_terms_recursive
count_prime_terms = count_prime_terms_iterative

def main():
    if len(sys.argv) > 1:
        numprimes = int(sys.argv[1])
    else:
        print 'Usage: %s numprimes [start]' % sys.argv[0]
        exit()

    start = int(sys.argv[2]) if len(sys.argv) > 2 else 3 

    #Round `start` up to the next odd number
    if start % 2 == 0:
        start += 1

    find_seed(numprimes, start)


if __name__ == '__main__':
    main()

当运行和

$ ./CollatzPrimes.py 65

输出是

seed, prime count, highest term, dict size
3 3 16 8
7 6 52 18
19 7 160 35
27 25 9232 136
97 26 9232 230
171 28 9232 354
231 29 9232 459
487 30 39364 933
763 32 250504 1626
1071 36 250504 2197
4011 37 1276936 8009
6171 43 8153620 12297
10971 44 27114424 21969
17647 48 27114424 35232
47059 50 121012864 94058
99151 51 1570824736 198927
117511 52 2482111348 235686
202471 53 17202377752 405273
260847 55 17202377752 522704
481959 59 24648077896 966011
963919 61 56991483520 1929199
1564063 62 151629574372 3136009
1805311 67 151629574372 3619607

Answer 2

让我们从一个确定数字是否为质数的函数开始；我们将使用 Miller-Rabin 算法，它比我们将要处理的数字大小的试验除法更快：

from random import randint

def isPrime(n, k=5): # miller-rabin
    if n < 2: return False
    for p in [2,3,5,7,11,13,17,19,23,29]:
        if n % p == 0: return n == p
    s, d = 0, n-1
    while d % 2 == 0:
        s, d = s+1, d/2
    for i in range(k):
        x = pow(randint(2, n-1), d, n)
        if x == 1 or x == n-1: continue
        for r in range(1, s):
            x = (x * x) % n
            if x == 1: return False
            if x == n-1: break
        else: return False
    return True

因为我们想找到第一个个满足条件的数字，我们的策略是从 2 开始，一路向上，边走边存储结果。我们用 Collatz 序列中从 0 到 2 的素数来填充缓存（这是一个糟糕的双关语，抱歉）：

primeCount = [0,0,1]

函数pCount(n)计算n的Collatz序列中的素数。一旦序列 k 的当前值低于 n，我们就会在缓存中查找结果。在那之前，我们测试 Collatz 序列中的每个奇数的素数，并在适当时增加素数计数 p。当我们有 n 的素数时，我们将它添加到缓存中并 return 它。

def pCount(n):
    k, p = n, 0
    while k > 0:
        if k < n:
            t = p + primeCount[k]
            primeCount.append(t)
            return t
        elif k % 2 == 0:
            k = k / 2
        elif isPrime(k):
            p = p + 1
            k = 3*k + 1
        else:
            k = 3*k + 1

现在只需计算每个 n 的质数个数，从 3 开始依次计算，当质数数超过 65 时停止：

n = 3
t = pCount(n)
while t < 65:
    n = n + 1
    t = pCount(n)

这不会花很长时间，在我的电脑上不到一分钟。结果如下：

print n

结果中有67个素数。如果你想看到它们，这里有一个简单的函数，可以打印给定 n:

的 Collatz 序列

def collatz(n):
    cs = []
    while n != 1:
        cs.append(n)
        n = 3*n+1 if n&1 else n//2
    cs.append(1)
    return cs

这是素数列表：

filter(isPrime,collatz(n))

多么有趣的休闲数学题！

编辑： 由于人们询问了 Miller-Rabin 素性测试器，让我展示这个基于 2,3,5 轮的简单素性测试器；它用 2、3 和 5 以及不是 2、3 或 5 的倍数的数字进行试除，其中包括一些复合数，因此用质数试除效率稍低，但没有必要预先计算并存储素数，因此使用起来更容易。

def isPrime(n): # 2,3,5-wheel
    if n < 2: return False
    wheel = [1,2,2,4,2,4,2,4,6,2,6]
    w, next = 2, 0
    while w * w <= n:
        if n % w == 0: return False
        w = w + wheel[next]
        next = next + 1
        if next > 10: next = 3
    return True

说 filter(isPrime,range(1000000)) 在短短几秒钟内识别出小于一百万的 78498 个素数。您可能想要比较此素数测试器与 Miller-Rabin 测试器的计时，并根据运行时间效率确定交叉点的位置；我没有做任何正式的计时，但它似乎解决了 65-collatz-prime 问题的时间与 Miller-Rabin teseter 大致相同。或者，您可能希望将试验师与 2、3、5 轮结合到某个限制，比如 1000 或 10000 或 25000，然后运行 Miller-Rabin 对幸存者；可以快速消除大多数复合材料，因此它可以运行非常快。

找到给出超过 65 个素数的最低 collatz 序列

finding the lowest collatz sequence that gives more that 65 primes

python

primes

memoization

sequence

collatz

找到给出超过 65 个素数的最低 collat​​z 序列

finding the lowest collatz sequence that gives more that 65 primes

python

primes

memoization

sequence

collatz

找到给出超过 65 个素数的最低 collatz 序列