将所有数字分解为给定数字的快速算法
Fast Algorithm to Factorize All Numbers Up to a Given Number
我正在寻找一种算法,该算法可以根据已经分解的数字分解数字。换句话说,我正在寻找一种快速算法来将所有数字分解为给定数字,并将它们存储在(我想这是最容易使用的数据结构)列表/元组元组中。我正在寻找一种“最多 n”的算法,因为我需要最多“n”的所有数字,我想这比一个一个地检查要快。
我希望这个算法在合理的时间内(少于一个小时)工作 2*10^8,对于我的程序 运行。我在 python 中尝试了一种更天真的方法,首先找到直到“n”的所有素数,然后对于每个数字“k”,通过检查每个素数直到被除以找到它的素因数分解(我们将调用它 p),那么它的因式分解就是 k/p + p.
的因式分解
from math import *
max=1000000 # We will check all numbers up to this number,
lst = [True] * (max - 2) # This is an algorithm I found online that will make the "PRIMES" list all the primes up to "max", very efficent
for i in range(2, int(sqrt(max) + 1)):
if lst[i - 2]:
for j in range(i ** 2, max, i):
lst[j - 2] = False
PRIMES = tuple([m + 2 for m in range(len(lst)) if lst[m]]) # (all primes up to "max")
FACTORS = [(0,),(1,)] #This will be a list of tuples where FACTORS[i] = the prime factors of i
for c in range(2,max): #check all numbers until max
if c in PRIMES:
FACTORS.append((c,)) #If it's a prime just add it in
else: #if it's not a prime...
i=0
while PRIMES[i]<= c: #Run through all primes until you find one that divides it,
if c%PRIMES[i] ==0:
FACTORS.append(FACTORS[c//PRIMES[i]] + (PRIMES[i],)) #If it does, add the prime with the factors of the division
break
i+=1
从测试来看,绝大多数时间都浪费在检查候选者是否为素数之后的 else 部分。这需要的不仅仅是我们的 for max = 200000000
P.S。 - 我用这个做什么 - 不重要
我运行这个程序是为了找到最小的“n”,使得对于某个“a”使得 (2n)!/((n+a)!^2) 是一个整数。基本上,我定义了 a_n = 最小的 k 使得 (2k)!/((k+n)!^2) 是一个整数。原来,a_1 =0,a_2 = 208,a_3 = 3475,a_4 = 8174,a_5 = 252965,a_6 = 3648835 , a_7 = 72286092. 顺便说一下,我注意到 a_n + n 是无平方的,尽管无法从数学上证明它。使用勒让德公式:https://en.wikipedia.org/wiki/Legendre%27s_formula,我写了这段代码:
from math import *
from bisect import bisect_right
max=100000000 # We will check all numbers up to this number,
lst = [True] * (max - 2) # This is an algorithm I found online that will make the "PRIMES" list all the primes up to "max", very efficent
for i in range(2, int(sqrt(max) + 1)):
if lst[i - 2]:
for j in range(i ** 2, max, i):
lst[j - 2] = False
PRIMES = tuple([m + 2 for m in range(len(lst)) if lst[m]]) # (all primes up to "max")
print("START")
def v(p,m):
return sum([ (floor(m/(p**i))) for i in range(1,1+ceil(log(m,p)))]) #This checks for the max power of prime p, so that p**(v(p,m)) divides factorial(m)
def check(a,n): #This function checks if a number n competes the criteria for a certain a
if PRIMES[bisect_right(PRIMES, n)]<= n + a: #First, it is obvious that if there is a prime between n+1 and n+a the criteria isn't met
return False
i=0
while PRIMES[i] <= n: #We will run through the primes smaller than n... THIS IS THE ROOM FOR IMPROVEMENT - instead of checking all the primes, check all primes that divide (n+1),(n+2),...,(n+a)
if v(PRIMES[i],2*n)<2*v(PRIMES[i],n+a): # If any prime divides the denominator more than the numerator, the fraction is obviously not a whole number
return False
i+=1
return True #If for all primes less than n, the numerator has a bigger max power of p than the denominator, the fraction is a whole number.
#Next, is a code that will just make sure that the program runs all numbers in order, and won't repeat anything.
start = 0 #start checking from this value
for a in range(1,20): #check for these values of a.
j=start
while not check(a,j):
if j%100000==0:
print("LOADING ", j) #just so i know how far the program has gotten.
j+=1
print("a-",a," ",j) #We found a number. great. print the result.
start=j #start from this value again, because the check obviously won't work for smaller values with a higher "a"
您可以使用脚本的第一部分来做到这一点!
代码:
from math import *
import time
MAX = 40000000
t = time.time()
# factors[i] = all the prime factors of i
factors = {}
# Running over all the numbers smaller than sqrt(MAX) since they can be the factors of MAX
for i in range(2, int(sqrt(MAX) + 1)):
# If this number has already been factored - it is not prime
if i not in factors:
# Find all the future numbers that this number will factor
for j in range(i * 2, MAX, i):
if j not in factors:
factors[j] = [i]
else:
factors[j].append(i)
print(time.time() - t)
for i in range(3, 15):
if i not in factors:
print(f"{i} is prime")
else:
print(f"{i}: {factors[i]}")
结果:
3: is prime
4: [2]
5: is prime
6: [2, 3]
7: is prime
8: [2]
9: [3]
10: [2, 5]
11: is prime
12: [2, 3]
13: is prime
14: [2, 7]
解释:
如评论中所述,它是对 Sieve of Eratosthenes 算法的修改。
对于每个数字,我们找到它可以在未来因式分解的所有数字。
如果该数字未出现在结果字典中,则它是素数,因为没有数字将其分解。
我们正在使用字典而不是列表,因此根本不需要保存质数 - 这对内存更友好但也有点慢。
时间:
根据对 MAX = 40000000
和 time.time()
的简单检查:110.14351892471313
秒。
对于 MAX = 1000000
:1.0785243511199951
秒。
对于 MAX = 200000000
和 time.time()
:1.5 小时后未完成...它已达到主循环中 6325 项中的第 111 项(这还不错,因为循环越远,它们变得越短).
不过我相信一个写得很好的 C 代码可以在半小时内完成(如果你愿意考虑它,我可能会写另一个答案)。可以做的更多优化是使用多线程和一些像 Miller–Rabin 这样的素数测试。当然值得一提的是,这些结果是在我的笔记本电脑上,也许在 PC 或专用机器上它会 运行 更快或更慢。
编辑:
我实际上问了一个 question in code review 关于这个答案,它有一些关于 运行 时间的很酷的图表!
编辑#2:
有人回答了我的问题,现在代码可以 运行 在 2.5 秒内进行一些修改。
由于之前的答案是在Python
中写的,所以速度很慢。以下代码的作用完全相同,但在 C++
中,它有一个线程每 10 秒监控一次获取到哪个素数。
#include <math.h>
#include <unistd.h>
#include <list>
#include <vector>
#include <ctime>
#include <thread>
#include <iostream>
#include <atomic>
#ifndef MAX
#define MAX 200000000
#define TIME 10
#endif
std::atomic<bool> exit_thread_flag{false};
void timer(int *i_ptr) {
for (int i = 1; !exit_thread_flag; i++) {
sleep(TIME);
if (exit_thread_flag) {
break;
}
std::cout << "i = " << *i_ptr << std::endl;
std::cout << "Time elapsed since start: "
<< i * TIME
<< " Seconds" << std::endl;
}
}
int main(int argc, char const *argv[])
{
int i, upper_bound, j;
std::time_t start_time;
std::thread timer_thread;
std::vector< std::list< int > > factors;
std::cout << "Initiallizating" << std::endl;
start_time = std::time(nullptr);
timer_thread = std::thread(timer, &i);
factors.resize(MAX);
std::cout << "Initiallization took "
<< std::time(nullptr) - start_time
<< " Seconds" << std::endl;
std::cout << "Starting calculation" << std::endl;
start_time = std::time(nullptr);
upper_bound = sqrt(MAX) + 1;
for (i = 2; i < upper_bound; ++i)
{
if (factors[i].empty())
{
for (j = i * 2; j < MAX; j += i)
{
factors[j].push_back(i);
}
}
}
std::cout << "Calculation took "
<< std::time(nullptr) - start_time
<< " Seconds" << std::endl;
// Closing timer thread
exit_thread_flag = true;
std::cout << "Validating results" << std::endl;
for (i = 2; i < 20; ++i)
{
std::cout << i << ": ";
if (factors[i].empty()) {
std::cout << "Is prime";
} else {
for (int v : factors[i]) {
std::cout << v << ", ";
}
}
std::cout << std::endl;
}
timer_thread.join();
return 0;
}
需要用以下行编译:
g++ main.cpp -std=c++0x -pthread
如果您不想将整个代码转换为 C++,您可以使用 Python 中的子进程库。
时间:
好吧,我尽力了,但它仍然 运行s 在一个多小时内......它已经达到 6619
这是第 855 个素数(好多了!)在 1.386111 小时(4990 秒) ).所以这是一个进步,但还有一段路要走! (没有另一个线程可能会更快)
我正在寻找一种算法,该算法可以根据已经分解的数字分解数字。换句话说,我正在寻找一种快速算法来将所有数字分解为给定数字,并将它们存储在(我想这是最容易使用的数据结构)列表/元组元组中。我正在寻找一种“最多 n”的算法,因为我需要最多“n”的所有数字,我想这比一个一个地检查要快。
我希望这个算法在合理的时间内(少于一个小时)工作 2*10^8,对于我的程序 运行。我在 python 中尝试了一种更天真的方法,首先找到直到“n”的所有素数,然后对于每个数字“k”,通过检查每个素数直到被除以找到它的素因数分解(我们将调用它 p),那么它的因式分解就是 k/p + p.
的因式分解from math import *
max=1000000 # We will check all numbers up to this number,
lst = [True] * (max - 2) # This is an algorithm I found online that will make the "PRIMES" list all the primes up to "max", very efficent
for i in range(2, int(sqrt(max) + 1)):
if lst[i - 2]:
for j in range(i ** 2, max, i):
lst[j - 2] = False
PRIMES = tuple([m + 2 for m in range(len(lst)) if lst[m]]) # (all primes up to "max")
FACTORS = [(0,),(1,)] #This will be a list of tuples where FACTORS[i] = the prime factors of i
for c in range(2,max): #check all numbers until max
if c in PRIMES:
FACTORS.append((c,)) #If it's a prime just add it in
else: #if it's not a prime...
i=0
while PRIMES[i]<= c: #Run through all primes until you find one that divides it,
if c%PRIMES[i] ==0:
FACTORS.append(FACTORS[c//PRIMES[i]] + (PRIMES[i],)) #If it does, add the prime with the factors of the division
break
i+=1
从测试来看,绝大多数时间都浪费在检查候选者是否为素数之后的 else 部分。这需要的不仅仅是我们的 for max = 200000000
P.S。 - 我用这个做什么 - 不重要
我运行这个程序是为了找到最小的“n”,使得对于某个“a”使得 (2n)!/((n+a)!^2) 是一个整数。基本上,我定义了 a_n = 最小的 k 使得 (2k)!/((k+n)!^2) 是一个整数。原来,a_1 =0,a_2 = 208,a_3 = 3475,a_4 = 8174,a_5 = 252965,a_6 = 3648835 , a_7 = 72286092. 顺便说一下,我注意到 a_n + n 是无平方的,尽管无法从数学上证明它。使用勒让德公式:https://en.wikipedia.org/wiki/Legendre%27s_formula,我写了这段代码:
from math import *
from bisect import bisect_right
max=100000000 # We will check all numbers up to this number,
lst = [True] * (max - 2) # This is an algorithm I found online that will make the "PRIMES" list all the primes up to "max", very efficent
for i in range(2, int(sqrt(max) + 1)):
if lst[i - 2]:
for j in range(i ** 2, max, i):
lst[j - 2] = False
PRIMES = tuple([m + 2 for m in range(len(lst)) if lst[m]]) # (all primes up to "max")
print("START")
def v(p,m):
return sum([ (floor(m/(p**i))) for i in range(1,1+ceil(log(m,p)))]) #This checks for the max power of prime p, so that p**(v(p,m)) divides factorial(m)
def check(a,n): #This function checks if a number n competes the criteria for a certain a
if PRIMES[bisect_right(PRIMES, n)]<= n + a: #First, it is obvious that if there is a prime between n+1 and n+a the criteria isn't met
return False
i=0
while PRIMES[i] <= n: #We will run through the primes smaller than n... THIS IS THE ROOM FOR IMPROVEMENT - instead of checking all the primes, check all primes that divide (n+1),(n+2),...,(n+a)
if v(PRIMES[i],2*n)<2*v(PRIMES[i],n+a): # If any prime divides the denominator more than the numerator, the fraction is obviously not a whole number
return False
i+=1
return True #If for all primes less than n, the numerator has a bigger max power of p than the denominator, the fraction is a whole number.
#Next, is a code that will just make sure that the program runs all numbers in order, and won't repeat anything.
start = 0 #start checking from this value
for a in range(1,20): #check for these values of a.
j=start
while not check(a,j):
if j%100000==0:
print("LOADING ", j) #just so i know how far the program has gotten.
j+=1
print("a-",a," ",j) #We found a number. great. print the result.
start=j #start from this value again, because the check obviously won't work for smaller values with a higher "a"
您可以使用脚本的第一部分来做到这一点!
代码:
from math import *
import time
MAX = 40000000
t = time.time()
# factors[i] = all the prime factors of i
factors = {}
# Running over all the numbers smaller than sqrt(MAX) since they can be the factors of MAX
for i in range(2, int(sqrt(MAX) + 1)):
# If this number has already been factored - it is not prime
if i not in factors:
# Find all the future numbers that this number will factor
for j in range(i * 2, MAX, i):
if j not in factors:
factors[j] = [i]
else:
factors[j].append(i)
print(time.time() - t)
for i in range(3, 15):
if i not in factors:
print(f"{i} is prime")
else:
print(f"{i}: {factors[i]}")
结果:
3: is prime
4: [2]
5: is prime
6: [2, 3]
7: is prime
8: [2]
9: [3]
10: [2, 5]
11: is prime
12: [2, 3]
13: is prime
14: [2, 7]
解释:
如评论中所述,它是对 Sieve of Eratosthenes 算法的修改。
对于每个数字,我们找到它可以在未来因式分解的所有数字。
如果该数字未出现在结果字典中,则它是素数,因为没有数字将其分解。
我们正在使用字典而不是列表,因此根本不需要保存质数 - 这对内存更友好但也有点慢。
时间:
根据对 MAX = 40000000
和 time.time()
的简单检查:110.14351892471313
秒。
对于 MAX = 1000000
:1.0785243511199951
秒。
对于 MAX = 200000000
和 time.time()
:1.5 小时后未完成...它已达到主循环中 6325 项中的第 111 项(这还不错,因为循环越远,它们变得越短).
不过我相信一个写得很好的 C 代码可以在半小时内完成(如果你愿意考虑它,我可能会写另一个答案)。可以做的更多优化是使用多线程和一些像 Miller–Rabin 这样的素数测试。当然值得一提的是,这些结果是在我的笔记本电脑上,也许在 PC 或专用机器上它会 运行 更快或更慢。
编辑:
我实际上问了一个 question in code review 关于这个答案,它有一些关于 运行 时间的很酷的图表!
编辑#2:
有人回答了我的问题,现在代码可以 运行 在 2.5 秒内进行一些修改。
由于之前的答案是在Python
中写的,所以速度很慢。以下代码的作用完全相同,但在 C++
中,它有一个线程每 10 秒监控一次获取到哪个素数。
#include <math.h>
#include <unistd.h>
#include <list>
#include <vector>
#include <ctime>
#include <thread>
#include <iostream>
#include <atomic>
#ifndef MAX
#define MAX 200000000
#define TIME 10
#endif
std::atomic<bool> exit_thread_flag{false};
void timer(int *i_ptr) {
for (int i = 1; !exit_thread_flag; i++) {
sleep(TIME);
if (exit_thread_flag) {
break;
}
std::cout << "i = " << *i_ptr << std::endl;
std::cout << "Time elapsed since start: "
<< i * TIME
<< " Seconds" << std::endl;
}
}
int main(int argc, char const *argv[])
{
int i, upper_bound, j;
std::time_t start_time;
std::thread timer_thread;
std::vector< std::list< int > > factors;
std::cout << "Initiallizating" << std::endl;
start_time = std::time(nullptr);
timer_thread = std::thread(timer, &i);
factors.resize(MAX);
std::cout << "Initiallization took "
<< std::time(nullptr) - start_time
<< " Seconds" << std::endl;
std::cout << "Starting calculation" << std::endl;
start_time = std::time(nullptr);
upper_bound = sqrt(MAX) + 1;
for (i = 2; i < upper_bound; ++i)
{
if (factors[i].empty())
{
for (j = i * 2; j < MAX; j += i)
{
factors[j].push_back(i);
}
}
}
std::cout << "Calculation took "
<< std::time(nullptr) - start_time
<< " Seconds" << std::endl;
// Closing timer thread
exit_thread_flag = true;
std::cout << "Validating results" << std::endl;
for (i = 2; i < 20; ++i)
{
std::cout << i << ": ";
if (factors[i].empty()) {
std::cout << "Is prime";
} else {
for (int v : factors[i]) {
std::cout << v << ", ";
}
}
std::cout << std::endl;
}
timer_thread.join();
return 0;
}
需要用以下行编译:
g++ main.cpp -std=c++0x -pthread
如果您不想将整个代码转换为 C++,您可以使用 Python 中的子进程库。
时间:
好吧,我尽力了,但它仍然 运行s 在一个多小时内......它已经达到 6619
这是第 855 个素数(好多了!)在 1.386111 小时(4990 秒) ).所以这是一个进步,但还有一段路要走! (没有另一个线程可能会更快)