将所有数字分解为给定数字的快速算法

Fast Algorithm to Factorize All Numbers Up to a Given Number

我正在寻找一种算法,该算法可以根据已经分解的数字分解数字。换句话说,我正在寻找一种快速算法来将所有数字分解为给定数字,并将它们存储在(我想这是最容易使用的数据结构)列表/元组元组中。我正在寻找一种“最多 n”的算法,因为我需要最多“n”的所有数字,我想这比一个一个地检查要快。

我希望这个算法在合理的时间内(少于一个小时)工作 2*10^8,对于我的程序 运行。我在 python 中尝试了一种更天真的方法,首先找到直到“n”的所有素数,然后对于每个数字“k”,通过检查每个素数直到被除以找到它的素因数分解(我们将调用它 p),那么它的因式分解就是 k/p + p.

的因式分解
from math import *
max=1000000 # We will check all numbers up to this number, 

lst = [True] * (max - 2) # This is an algorithm I found online that will make the "PRIMES" list all the primes up to "max", very efficent
for i in range(2, int(sqrt(max) + 1)):
  if lst[i - 2]:
    for j in range(i ** 2, max, i):
      lst[j - 2] = False

PRIMES = tuple([m + 2 for m in range(len(lst)) if lst[m]]) # (all primes up to "max")

FACTORS = [(0,),(1,)] #This will be a list of tuples where FACTORS[i] = the prime factors of i
for c in range(2,max): #check all numbers until max
  if c in PRIMES:
    FACTORS.append((c,)) #If it's a prime just add it in
  else: #if it's not a prime...
    i=0
    while PRIMES[i]<= c: #Run through all primes until you find one that divides it,
      if c%PRIMES[i] ==0: 
        FACTORS.append(FACTORS[c//PRIMES[i]] + (PRIMES[i],)) #If it does, add the prime with the factors of the division
        break
      i+=1

从测试来看,绝大多数时间都浪费在检查候选者是否为素数之后的 else 部分。这需要的不仅仅是我们的 for max = 200000000


P.S。 - 我用这个做什么 - 不重要

我运行这个程序是为了找到最小的“n”,使得对于某个“a”使得 (2n)!/((n+a)!^2) 是一个整数。基本上,我定义了 a_n = 最小的 k 使得 (2k)!/((k+n)!^2) 是一个整数。原来,a_1 =0,a_2 = 208,a_3 = 3475,a_4 = 8174,a_5 = 252965,a_6 = 3648835 , a_7 = 72286092. 顺便说一下,我注意到 a_n + n 是无平方的,尽管无法从数学上证明它。使用勒让德公式:https://en.wikipedia.org/wiki/Legendre%27s_formula,我写了这段代码:

from math import *
from bisect import bisect_right
max=100000000 # We will check all numbers up to this number, 

lst = [True] * (max - 2) # This is an algorithm I found online that will make the "PRIMES" list all the primes up to "max", very efficent
for i in range(2, int(sqrt(max) + 1)):
  if lst[i - 2]:
    for j in range(i ** 2, max, i):
      lst[j - 2] = False

PRIMES = tuple([m + 2 for m in range(len(lst)) if lst[m]]) # (all primes up to "max")
print("START")

def v(p,m):
  return sum([ (floor(m/(p**i))) for i in range(1,1+ceil(log(m,p)))]) #This checks for the max power of prime p, so that p**(v(p,m)) divides factorial(m)

def check(a,n): #This function checks if a number n competes the criteria for a certain a
  if PRIMES[bisect_right(PRIMES, n)]<= n + a: #First, it is obvious that if there is a prime between n+1 and n+a the criteria isn't met
    return False
  i=0
  while PRIMES[i] <= n: #We will run through the primes smaller than n... THIS IS THE ROOM FOR IMPROVEMENT - instead of checking all the primes, check all primes that divide (n+1),(n+2),...,(n+a)
    if v(PRIMES[i],2*n)<2*v(PRIMES[i],n+a): # If any prime divides the denominator more than the numerator, the fraction is obviously not a whole number
      return False
    i+=1
  return True #If for all primes less than n, the numerator has a bigger max power of p than the denominator, the fraction is a whole number.

#Next, is a code that will just make sure that the program runs all numbers in order, and won't repeat anything.

start = 0 #start checking from this value
for a in range(1,20): #check for these values of a.
  j=start
  while not check(a,j):
    if j%100000==0:
      print("LOADING ", j) #just so i know how far the program has gotten.
    j+=1
  print("a-",a," ",j) #We found a number. great. print the result.
  start=j #start from this value again, because the check obviously won't work for smaller values with a higher "a"

您可以使用脚本的第一部分来做到这一点!

代码:

from math import *
import time

MAX = 40000000

t = time.time()
# factors[i] = all the prime factors of i
factors = {}
# Running over all the numbers smaller than sqrt(MAX) since they can be the factors of MAX
for i in range(2, int(sqrt(MAX) + 1)):
    # If this number has already been factored - it is not prime
    if i not in factors:
        # Find all the future numbers that this number will factor
        for j in range(i * 2, MAX, i):
            if j not in factors:
                factors[j] = [i]
            else:
                factors[j].append(i)
print(time.time() - t)

for i in range(3, 15):
    if i not in factors:
        print(f"{i} is prime")
    else:
        print(f"{i}: {factors[i]}")

结果:

3: is prime
4: [2]
5: is prime
6: [2, 3]
7: is prime
8: [2]
9: [3]
10: [2, 5]
11: is prime
12: [2, 3]
13: is prime
14: [2, 7]

解释:

如评论中所述,它是对 Sieve of Eratosthenes 算法的修改。
对于每个数字,我们找到它可以在未来因式分解的所有数字。
如果该数字未出现在结果字典中,则它是素数,因为没有数字将其分解。 我们正在使用字典而不是列表,因此根本不需要保存质数 - 这对内存更友好但也有点慢。

时间:

根据对 MAX = 40000000time.time() 的简单检查:110.14351892471313 秒。
对于 MAX = 10000001.0785243511199951 秒。
对于 MAX = 200000000time.time():1.5 小时后未完成...它已达到主循环中 6325 项中的第 111 项(这还不错,因为循环越远,它们变得越短).

不过我相信一个写得很好的 C 代码可以在半小时内完成(如果你愿意考虑它,我可能会写另一个答案)。可以做的更多优化是使用多线程和一些像 Miller–Rabin 这样的素数测试。当然值得一提的是,这些结果是在我的笔记本电脑上,也许在 PC 或专用机器上它会 运行 更快或更慢。

编辑:

我实际上问了一个 question in code review 关于这个答案,它有一些关于 运行 时间的很酷的图表!

编辑#2:

有人回答了我的问题,现在代码可以 运行 在 2.5 秒内进行一些修改。


由于之前的答案是在Python中写的,所以速度很慢。以下代码的作用完全相同,但在 C++ 中,它有一个线程每 10 秒监控一次获取到哪个素数。

#include <math.h>
#include <unistd.h>
#include <list>
#include <vector>
#include <ctime>
#include <thread>
#include <iostream>
#include <atomic>

#ifndef MAX
#define MAX 200000000
#define TIME 10
#endif


std::atomic<bool> exit_thread_flag{false};

void timer(int *i_ptr) {
    for (int i = 1; !exit_thread_flag; i++) {
        sleep(TIME);
        if (exit_thread_flag) {
            break;
        }
        std::cout << "i = " << *i_ptr << std::endl;
        std::cout << "Time elapsed since start: " 
                  << i * TIME 
                  << " Seconds" << std::endl;
    }
}

int main(int argc, char const *argv[])
{
    int i, upper_bound, j;
    std::time_t start_time;
    std::thread timer_thread;
    std::vector< std::list< int > > factors;

    std::cout << "Initiallizating" << std::endl;
    start_time = std::time(nullptr);
    timer_thread = std::thread(timer, &i);
    factors.resize(MAX);
    std::cout << "Initiallization took " 
              << std::time(nullptr) - start_time 
              << " Seconds" << std::endl;

    std::cout << "Starting calculation" << std::endl;
    start_time = std::time(nullptr);
    upper_bound = sqrt(MAX) + 1;
    for (i = 2; i < upper_bound; ++i)
    {
        if (factors[i].empty())
        {
            for (j = i * 2; j < MAX; j += i)
            {
                factors[j].push_back(i);
            }
        }
    }
    std::cout << "Calculation took " 
              << std::time(nullptr) - start_time 
              << " Seconds" << std::endl;

    // Closing timer thread
    exit_thread_flag = true;

    std::cout << "Validating results" << std::endl;
    for (i = 2; i < 20; ++i)
    {
        std::cout << i << ": ";
        if (factors[i].empty()) {
            std::cout << "Is prime";
        } else {
            for (int v : factors[i]) {
                std::cout << v << ", ";
            }
        }
        std::cout << std::endl;
    }
    
    timer_thread.join();
    return 0;
}

需要用以下行编译:

g++ main.cpp -std=c++0x -pthread

如果您不想将整个代码转换为 C++,您可以使用 Python 中的子进程库。


时间:

好吧,我尽力了,但它仍然 运行s 在一个多小时内......它已经达到 6619 这是第 855 个素数(好多了!)在 1.386111 小时(4990 秒) ).所以这是一个进步,但还有一段路要走! (没有另一个线程可能会更快)