查找幻数 C++
Find Magic Numbers C++
幻数
一个正整数是“神奇的”,当且仅当它可以通过重复除以 2(如果是偶数)或乘以 3 然后加 1(如果是奇数)来减少到 1。因此,例如,3 是神奇的,因为 3 首先减少到 10 (3*3+1),然后减少到 5 (10/2),然后减少到 16 (5*3+1),然后减少到 8 (16/2) ,然后是 4 (8/2),然后是 2 (4/2),最后是 1 (2/2)。幻数假说指出所有正整数都是幻数,或者,正式地说:∀x ∈ Z, MAGIC(x) 其中 MAGIC(x) 是谓词“x is magic”。
我们应该开发一个 C++ 程序来找到 "Magic Numbers" 从 1 到 50 亿。如果正确完成,应该需要 80 秒或更短的时间。我的大约需要 2 小时,而我们得到的示例程序需要 14 天。这是我的代码,我该怎么做才能加快速度?我是否遗漏了明显的优化问题?
#include <iostream>
using namespace std;
bool checkMagic(unsigned long number);
int main()
{
for(unsigned long i = 1; i <= 5000000000; i++)
{
if(i % 1000000 == 0)
{
//only print out every 1000000 numbers to keep track, but slow it down
//by printing out every single number
cout<<i<<endl;
}
if(!checkMagic(i))
{
//not magic
cout<<i<<" not a magic number"<<endl;
break;
}
}
}
bool checkMagic(unsigned long number)
{
if(number % 2 == 0)
{
number = number / 2;
}
else
{
number = (number * 3) + 1;
}
if(number != 1)
{
checkMagic(number);
}
return 1;
}
这道题主要是想验证Collatz Conjecture up to 5B.
我认为这里的关键是,对于我们正在检查的每个数字n,查看乐观情景和悲观情景,并在恢复到乐观情景之前检查乐观情景悲观者。
在乐观的情况下,我们根据 n / 2 修改 n ; 3n + 1规则,数字序列将:
在有限的步骤中变得小于 n(在这种情况下,我们可以检查我们对那个较小数字的了解)。
不会造成步骤溢出
(正如 TonyK 正确指出的那样,我们不能依赖(甚至不依赖第一个))。
所以,对于乐观的场景,我们可以使用下面的函数:
#include <unordered_set>
#include <set>
#include <iostream>
#include <memory>
#include <list>
#include <gmp.h>
using namespace std;
using found_set = unordered_set<size_t>;
bool fast_verify(size_t i, size_t max_tries, const found_set &found) {
size_t tries = 0;
size_t n = i;
while(n != 1) {
if(++tries == max_tries )
return false;
if(n < i)
return found.empty() || found.find(i) == found.end();
if(n % 2 == 0)
n /= 2;
else if(__builtin_mul_overflow (n, 3, &n) || __builtin_add_overflow(n, 1, &n))
return false;
}
return true;
}
注意以下几点:
该函数仅尝试验证其接收到的数字的猜想。如果是returnstrue
,就已经验证过了。如果returnsfalse
,那只是说明它还没有被验证(即不代表它已经被反驳了)。
它带一个参数max_tries
,最多只验证这个步数。如果超过了这个数字,它不会尝试辨别这是否是无限循环的一部分 - 它只是 returns 验证失败。
它需要一个unordered_set
个失败的已知数(当然,如果Collatz猜想成立,这个集合将永远是空的)。
它通过 __builtin_*_overflow
检测溢出。 (不幸的是,这是特定于 gcc 的。不同的平台可能需要一组不同的此类函数。)
现在是缓慢但可靠的功能。这个函数使用了GNU MP multi-precision arithmetic library。它通过维护已经遇到的数字序列来检查无限循环。这个函数returns true
if猜想对这个数已经证明,false
if对这个数反驳(注意和之前快速验证的区别)
bool slow_check(size_t i) {
mpz_t n_;
mpz_init(n_);
mpz_t rem_;
mpz_init(rem_);
mpz_t i_;
mpz_init(i_);
mpz_set_ui(i_, i);
mpz_set_ui(n_, i);
list<mpz_t> seen;
while(mpz_cmp_ui(n_, 1) != 0) {
if(mpz_cmp_ui(n_, i) < 0)
return true;
mpz_mod_ui(rem_, n_, 2);
if(mpz_cmp_ui(rem_, 0) == 0) {
mpz_div_ui(n_, n_, 2);
}
else {
mpz_mul_ui(n_, n_, 3);
mpz_add_ui(n_, n_, 1);
}
seen.emplace_back(n_);
for(const auto &e0: seen)
for(const auto &e1: seen)
if(&e0 != &e1 && mpz_cmp(e0, e1) == 0)
return false;
}
return true;
}
最后,main
维护了一个 unordered_set
被证伪的数字。对于每个数字,它乐观地尝试验证猜想,然后,如果它失败(溢出或超过迭代次数),使用慢速方法:
int main()
{
const auto max_num = 5000000000;
found_set found;
for(size_t i = 1; i <= max_num; i++) {
if(i % 1000000000 == 0)
cout << "iteration " << i << endl;
auto const f = fast_verify(i, max_num, found);
if(!f && !slow_check(i))
found.insert(i);
}
for(auto e: found)
cout << e << endl;
}
运行 完整代码(下方)给出:
$ g++ -O3 --std=c++11 magic2.cpp -lgmp && time ./a.out
iteration 1000000000
iteration 2000000000
iteration 3000000000
iteration 4000000000
iteration 5000000000
real 1m3.933s
user 1m3.924s
sys 0m0.000s
$ uname -a
Linux atavory 4.4.0-38-generic #57-Ubuntu SMP Tue Sep 6 15:42:33 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$ sudo lshw | grep -i cpu
*-cpu
description: CPU
product: Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz
bus info: cpu@0
version: Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz
capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts cpufreq
也就是说,没有发现被证实的数字,运行 时间在 ~64 秒。
完整代码:
#include <unordered_set>
#include <set>
#include <iostream>
#include <memory>
#include <list>
#include <gmp.h>
using namespace std;
using found_set = unordered_set<size_t>;
bool fast_verify(size_t i, size_t max_tries, const found_set &found) {
size_t tries = 0;
size_t n = i;
while(n != 1) {
if(++tries == max_tries )
return false;
if(n < i)
return found.empty() || found.find(i) == found.end();
if(n % 2 == 0)
n /= 2;
else if(__builtin_mul_overflow (n, 3, &n) || __builtin_add_overflow(n, 1, &n))
return false;
}
return true;
}
bool slow_check(size_t i) {
mpz_t n_;
mpz_init(n_);
mpz_t rem_;
mpz_init(rem_);
mpz_t i_;
mpz_init(i_);
mpz_set_ui(i_, i);
mpz_set_ui(n_, i);
list<mpz_t> seen;
while(mpz_cmp_ui(n_, 1) != 0) {
if(mpz_cmp_ui(n_, i) < 0)
return true;
mpz_mod_ui(rem_, n_, 2);
if(mpz_cmp_ui(rem_, 0) == 0) {
mpz_div_ui(n_, n_, 2);
}
else {
mpz_mul_ui(n_, n_, 3);
mpz_add_ui(n_, n_, 1);
}
seen.emplace_back(n_);
for(const auto &e0: seen)
for(const auto &e1: seen)
if(&e0 != &e1 && mpz_cmp(e0, e1) == 0)
return false;
}
return true;
}
int main()
{
const auto max_num = 5000000000;
found_set found;
for(size_t i = 1; i <= max_num; i++) {
if(i % 1000000000 == 0)
cout << "iteration " << i << endl;
auto const f = fast_verify(i, max_num, found);
if(!f && !slow_check(i))
found.insert(i);
}
for(auto e: found)
cout << e << endl;
return 0;
}
您的代码和 Ami Tavory 的回答完全忽略了溢出问题。如果您的范围内 是 一个非幻数,那么 Collatz 迭代要么进入循环,要么无限增加。如果它无限增加,它肯定会溢出,不管你的 unsigned long
有多大;如果它进入一个循环,它很可能在到达那里之前溢出。所以你必须在那里进行溢出检查:
#include <limits.h>
...
if (number > (ULONG_MAX - 1) / 3)
PrintAnApologyAndDie() ;
number = (number * 3) + 1;
what can I do to speed it up? Am I missing obvious optimization issues?
- for(unsigned long i = 1; i <= 5000000000; i++)
for 循环不是调用方法 5B 次的最快编码风格。
在我的代码中,请参阅 unwind100() 和 innerLoop()。我编写的 unwind'ing 消除了 99% 的 innerLoop 开销,节省了超过 5 秒(在我的桌面 DD5 上)。您的储蓄可能会有所不同。维基百科有一篇关于展开循环的文章,其中简要描述了节省的潜力。
- if (i % 1000000 == 0)
此代码显然会生成进度报告。耗资50亿
比较,它对幻数检查没有任何作用。
查看我的代码 outerLoop,它调用了 10 次 innerLoop。这提供了一个
每个外循环的 1 个进度报告的方便位置。考虑拆分
你的循环变成 'innerLoop' 和 'outerLoop'。测试 50 亿次是
比向 innerLoops 添加 10 次调用更昂贵。
- 如评论中所述,无论测试结果如何,您的 checkMagic() return 都是“1”。一个简单的错误,但我不知道这个错误是否影响性能。
我认为您的 checkMagic() 没有实现尾递归,这确实会降低您的代码速度。
查看我的方法 checkMagicR(),它具有尾递归,并且 returns 为真或
假的。
此外,在您的 CheckMagic() 中,在奇数输入的子句中,您忽略了当 n 为正数且为奇数时 (3n+1) 必须为偶数的想法。我的代码执行 'early-div-by-2',从而尽可能减少未来的工作量。
- 在我的方法 unwind10() 中,请注意,我的代码通过使用 checkMagicRO()(对于奇数输入)和checkMagicRE()(偶数)。
在我的早期版本中,代码浪费时间检测已知偶数输入是否为偶数,然后将其除以 2,然后 return 为真。 checkMagicRE() 跳过测试和划分,节省时间。
CheckMagicRO() 跳过奇数测试,然后执行奇数并继续进入 checkMagicR()。
- if(number != 1) { checkMagic(number); }
您的递归一直持续到 1 ...没有必要,并且可能对整体性能造成最大的打击。
查看我的 checkMagicR() "recursion termination clause"。我的代码在 ((n/2) < m_N) 时停止,m_N 是启动此递归的测试值。这个想法是所有较小的测试输入都已经被证明是神奇的,否则代码就会断言。当环绕即将到来时它也会停止......这还没有发生。
结果:
我的桌面 (DD5) 更旧、更慢,并且 运行 Ubuntu 15.10 (64)。此代码是使用 g++ v5.2.1 开发的,并且仅在没有超时断言的情况下使用 -O3 完成。
DD5 显然比 Amy Tavory 使用的机器慢 2 倍(比较他的代码 运行 在 DD5 上的表现)。
在 DD5 上,我的代码在模式 1 中完成大约 44 秒,在模式 2 中完成 22 秒。
更快的机器可能会在 11 秒内完成此代码(在模式 2 中)。
完整代码:
// V9
#include <chrono>
#include <iomanip>
#include <iostream>
#include <limits>
#include <sstream>
#include <thread>
#include <vector>
#include <cassert>
// chrono typedef - shorten one long name
typedef std::chrono::high_resolution_clock Clock_t;
class T431_t
{
private:
uint64_t m_N; // start-of-current-recursion
const uint64_t m_upperLimitN; // wrap-around avoidance / prevention
std::stringstream m_ssMagic; // dual threads use separate out streams
uint64_t m_MaxRcrsDpth; // diag only
uint64_t m_Max_n; // diag only
uint64_t m_TotalRecurse; // diag only
Clock_t::time_point m_startMS; // Derived req - progress info
std::stringstream m_ssDuration;
std::ostream* m_so; // dual threads - std::cout or m_ssMagic
uint64_t m_blk;
const int64_t m_MaxThreadDuration; // single Max80KMs, dual Max40KMs
// enum of known type (my preference for 'internal' constants)
enum T431_Constraints : uint64_t
{
ZERO = 0,
B00 = 1, // least-significant-bit is bit 0
ONE = 1, // lsbit set
TWO = 2,
THREE = 3,
//
K = 1000, // instead of 1024,
//
BlkSize = ((K * K * K) / 2), // 0.5 billion
//
MaxSingleCoreMS = (80 * K), // single thread max
MaxDualCoreMS = (40 * K) // dual thread max
};
// convenience method: show both hex and decimal on one line (see dtor)
inline std::string showHexDec(const std::string& label, uint64_t val) {
std::stringstream ss;
ss << "\n" << label
<< " = 0x" << std::hex << std::setfill('0') << std::setw(16) << val
<< " (" << std::dec << std::setfill(' ') << std::setw(22) << val << ")";
return (ss.str());
}
// duration: now() - m_startMS
std::chrono::milliseconds getChronoMillisecond() {
return (std::chrono::duration_cast<std::chrono::milliseconds>
(Clock_t::now() - m_startMS));
}
// reports milliseconds since m_startMS time
void ssDurationPut (const std::string& label)
{
m_ssDuration.str(std::string()); // empty string
m_ssDuration.clear(); // clear ss flags
std::chrono::milliseconds durationMS = getChronoMillisecond();
uint64_t durationSec = (durationMS.count() / 1000);
uint64_t durationMSec = (durationMS.count() % 1000);
m_ssDuration
<< label << std::dec << std::setfill(' ') << std::setw(3) << durationSec
<< "." << std::dec << std::setfill('0') << std::setw(3) << durationMSec
<< " sec";
}
inline void reportProgress()
{
std::chrono::milliseconds durationMS = getChronoMillisecond();
assert(durationMS.count() < m_MaxThreadDuration); // normal finish -> "no endless loop"
//
uint64_t durationSec = (durationMS.count() / 1000);
uint64_t durationMSec = (durationMS.count() % 1000);
*m_so << " m_blk = " << m_blk
<< " m_N = 0x" << std::setfill('0') << std::hex << std::setw(16) << m_N
<< " " << std::dec << std::setfill(' ') << std::setw(3) << durationSec
<< "." << std::dec << std::setfill('0') << std::setw(3) << durationMSec
<< " sec (" << std::dec << std::setfill(' ') << std::setw(10) << m_N << ")"
<< std::endl;
}
public:
T431_t(uint64_t maxDuration = MaxSingleCoreMS) :
m_N (TWO), // start val of current recursion
m_upperLimitN (initWrapAroundLimit()),
// m_ssMagic // default ctor ok - empty
m_MaxRcrsDpth (ZERO),
m_Max_n (TWO),
m_TotalRecurse (ZERO),
m_startMS (Clock_t::now()),
// m_ssDuration // default ctor ok - empty
m_so (&std::cout), // default
m_blk (ZERO),
m_MaxThreadDuration (maxDuration)
{
m_ssMagic.str().reserve(1024*2);
m_ssDuration.str().reserve(256);
}
~T431_t()
{
// m_MaxThreadDuration
// m_blk
// m_so
// m_ssDuration
// m_startMS
// m_Max_n
// m_TotalRecurse
// m_MaxRcrsDpth
if (m_MaxRcrsDpth) // diag only
{
ssDurationPut ("\n duration = " );
std::cerr << "\n T431() dtor "
//
<< showHexDec(" u64MAX",
std::numeric_limits<uint64_t>::max()) << "\n"
//
<< showHexDec(" m_Max_n", m_Max_n)
<< showHexDec(" (3*m_Max_n)+1", ((3*m_Max_n)+1))
<< showHexDec(" upperLimitN", m_upperLimitN)
// ^^^^^^^^^^^ no wrap-around possible when n is less
//
<< "\n"
<< showHexDec(" m_MaxRcrsDpth", m_MaxRcrsDpth)
<< "\n"
<< showHexDec(" m_TotalRecurse", m_TotalRecurse)
//
<< m_ssDuration.str() << std::endl;
}
// m_ssMagic
// m_upperLimitN
// m_N
if(m_MaxThreadDuration == MaxSingleCoreMS)
std::cerr << "\n " << __FILE__ << " by Douglas O. Moen Compiled on = "
<< __DATE__ << " " << __TIME__ << "\n";
}
private:
inline bool checkMagicRE(uint64_t nE) // n is known even, and the recursion starting point
{
return(true); // for unit test, comment out this line to run the asserts
// unit test - enable both asserts
assert(nE == m_N); // confirm recursion start value
assert(ZERO == (nE & B00)); // confirm even
return(true); // nothing more to do
}
inline bool checkMagicRO(uint64_t nO) // n is known odd, and the recursion starting point
{
// unit test - uncomment the following line to measure checkMagicRE() performance
// return (true); // when both methods do nothing - 0.044
// unit test - enable both these asserts
// assert(nO == m_N); // confirm recursion start value
// assert(nO & B00); // confirm odd
uint64_t nxt;
{
assert(nO < m_upperLimitN); // assert that NO wrap-around occurs
// NOTE: the sum of 4 odd uint's is even, and is destined to be
// divided by 2 in the next recursive invocation.
// we need not wait to divide by 2
nxt = ((nO+nO+nO+ONE) >> ONE); // combine the arithmetic
// |||||||||||| ||||||
// sum is even unknown after sum divided by two
}
return (checkMagicR(nxt));
}
inline void unwind8()
{
// skip 0, 1
assert(checkMagicRE (m_N)); m_N++; // 2
assert(checkMagicRO (m_N)); m_N++; // 3
assert(checkMagicRE (m_N)); m_N++; // 4
assert(checkMagicRO (m_N)); m_N++; // 5
assert(checkMagicRE (m_N)); m_N++; // 6
assert(checkMagicRO (m_N)); m_N++; // 7
assert(checkMagicRE (m_N)); m_N++; // 8
assert(checkMagicRO (m_N)); m_N++; // 9
}
inline void unwind10()
{
assert(checkMagicRE (m_N)); m_N++; // 0
assert(checkMagicRO (m_N)); m_N++; // 1
assert(checkMagicRE (m_N)); m_N++; // 2
assert(checkMagicRO (m_N)); m_N++; // 3
assert(checkMagicRE (m_N)); m_N++; // 4
assert(checkMagicRO (m_N)); m_N++; // 5
assert(checkMagicRE (m_N)); m_N++; // 6
assert(checkMagicRO (m_N)); m_N++; // 7
assert(checkMagicRE (m_N)); m_N++; // 8
assert(checkMagicRO (m_N)); m_N++; // 9
}
inline void unwind98()
{
unwind8();
unwind10(); // 1
unwind10(); // 2
unwind10(); // 3
unwind10(); // 4
unwind10(); // 5
unwind10(); // 6
unwind10(); // 7
unwind10(); // 8
unwind10(); // 9
}
inline void unwind100()
{
unwind10(); // 0
unwind10(); // 1
unwind10(); // 2
unwind10(); // 3
unwind10(); // 4
unwind10(); // 5
unwind10(); // 6
unwind10(); // 7
unwind10(); // 8
unwind10(); // 9
}
inline void innerLoop (uint64_t start_N, uint64_t end_N)
{
for (uint64_t iLoop = start_N; iLoop < end_N; iLoop += 100)
{
unwind100();
}
reportProgress();
}
inline void outerLoop(const std::vector<uint64_t>& start_Ns,
const std::vector<uint64_t>& end_Ns)
{
*m_so << "\n outerLoop \n m_upperLimitN = 0x"
<< std::hex << std::setfill('0') << std::setw(16) << m_upperLimitN
<< "\n m_N = 0x" << std::setw(16) << start_Ns[0]
<< " " << std::dec << std::setfill(' ') << std::setw(3) << 0
<< "." << std::dec << std::setfill('0') << std::setw(3) << 0
<< " sec (" << std::dec << std::setfill(' ') << std::setw(10)
<< start_Ns[0] << ")" << std::endl;
size_t szStart = start_Ns.size();
size_t szEnd = end_Ns.size();
assert(szEnd == szStart);
m_blk = 0;
{ // when blk 0 starts at offset 2 (to skip values 0 and 1)
if(TWO == start_Ns[0])
{
m_N = TWO; // set m_N start
unwind98(); // skip 0 and 1
assert(100 == m_N); // confirm
innerLoop (100, end_Ns[m_blk]);
m_blk += 1;
}
else // thread 2 blk 0 starts in the middle of 5 billion range
{
m_N = start_Ns[m_blk]; // set m_N start, do full block
}
}
for (; m_blk < szStart; ++m_blk)
{
innerLoop (start_Ns[m_blk], end_Ns[m_blk]);
}
}
// 1 == argc
void exec1 (void) // no parameters --> check all values
{
const std::vector<uint64_t> start_Ns =
{ TWO, (BlkSize*1), (BlkSize*2), (BlkSize*3), (BlkSize*4),
(BlkSize*5), (BlkSize*6), (BlkSize*7), (BlkSize*8), (BlkSize*9) };
// blk 0 blk 1 blk 2 blk 3 blk 4
// blk 5 blk 6 blk 7 blk 8 blk 9
const std::vector<uint64_t> end_Ns =
{ (BlkSize*1), (BlkSize*2), (BlkSize*3), (BlkSize*4), (BlkSize*5),
(BlkSize*6), (BlkSize*7), (BlkSize*8), (BlkSize*9), (BlkSize*10) };
m_startMS = Clock_t::now();
// outerLoop : 10 blocks generates 10 progressReports()
outerLoop( start_Ns, end_Ns);
ssDurationPut("");
std::cout << " execDuration = " << m_ssDuration.str() << " " << std::endl;
} // void exec1 (void)
void exec2a () // thread entry a
{
//lower half of test range
const std::vector<uint64_t> start_Ns =
{ TWO , (BlkSize*1), (BlkSize*2), (BlkSize*3), (BlkSize*4) };
// blk 0 blk 1 blk 2 blk 3 blk 4
const std::vector<uint64_t> end_Ns =
{ (BlkSize*1), (BlkSize*2), (BlkSize*3), (BlkSize*4), (BlkSize*5) };
m_startMS = Clock_t::now();
// m_so to std::cout
// uses 5 blocks to gen 5 progress indicators
outerLoop (start_Ns, end_Ns);
ssDurationPut("");
std::cout << " execDuration = " << m_ssDuration.str() << " (a) " << std::endl;
} // exec2a (void)
void exec2b () // thread entry b
{
// upper half of test range
const std::vector<uint64_t> start_Ns =
{ (BlkSize*5), (BlkSize*6), (BlkSize*7), (BlkSize*8), (BlkSize*9) };
// blk 5 blk 6 blk 7 blk 8 blk 9
const std::vector<uint64_t> end_Ns =
{ (BlkSize*6), (BlkSize*7), (BlkSize*8), (BlkSize*9), (BlkSize*10) };
m_startMS = Clock_t::now();
m_so = &m_ssMagic;
// uses 5 blocks to gen 5 progress indicators
outerLoop(start_Ns, end_Ns);
ssDurationPut("");
m_ssMagic << " execDuration = " << m_ssDuration.str() << " (b) " << std::endl;
} // exec2b (void)
// 3 == argc
int exec3 (std::string argv1, // recursion start value
std::string argv2) // values count
{
uint64_t start_N = std::stoull(argv1);
uint64_t length = std::stoull(argv2);
// range checks
{
std::string note1;
switch (start_N) // limit check
{
case 0: { start_N = 3; length -= 3;
note1 = "... 0 is below test range (change start to 3 ";
} break;
case 1: { start_N = 3; length -= 2;
note1 = "... 1 is defined magic (change start to 3";
} break;
case 2: { start_N = 3; length -= 1;
note1 = "... 2 is trivially magic (change start to 3";
} break;
default:
{ // when start_N from user is even
if (ZERO == (start_N & B00))
{
start_N -= 1;
note1 = "... even is not an interesting starting point";
length += 1;
}
}
break;
} // switch (start_N)
// start_N not restrained ... allow any manual search
// but uin64_t wrap-around still preempted
std::cout << "\n start_N: " << start_N << " " << note1
<< "\n length: " << length << " " << std::endl;
}
m_startMS = Clock_t::now();
for (m_N = start_N; m_N < (start_N + length); ++m_N)
{
bool magicNumberFound = checkMagicRD (m_N, 1);
std::cout << m_ssMagic.str() << std::dec << m_N
<< " m_MaxRcrsDpth: " << m_MaxRcrsDpth << ";"
<< " m_Max_n: " << m_Max_n << ";" << std::endl;
m_ssMagic.str(std::string()); // empty string
m_ssMagic.clear(); // clear flags)
if (!magicNumberFound)
{
std::cerr << m_N << " not a magic number!" << std::endl;
break;
}
} // for (uint64_t k = kStart; k < kEnd; ++k)
ssDurationPut("");
std::cout << " execDuration = " << m_ssDuration.str() << " " << std::endl;
return(0);
} // int exec3 (std::string argv1, std::string argv2)
// conversion
std::string ssMagicShow() { return (m_ssMagic.str()); }
// D3: use recursion for closer match to definition
bool checkMagicR(uint64_t n) // recursive Performance
{
uint64_t nxt;
{
if (n & B00) // n-odd
{
if (n >= m_upperLimitN) // wraparound imminent abort
return (false); // recursion termination clause
// NOTE: the sum of 4 odd uint's is even, and will be halved
// we need not wait for the next recursion to divide-by-2
nxt = ((n+n+n+ONE) >> ONE); // combine the arithmetic
// known even
}
else // n-even
{
nxt = (n >> ONE); // bit shift divide by 2
if (nxt < m_N) // nxt < m_N start of recursion
return ( true ); // recursion termination clause
// We need not de-curse to 1 because all
// (n < m_N) are asserted as magic
}
}
return (checkMagicR(nxt)); // tail recursion
} // bool checkMagicR(uint64_t n)
// D4: diagnostic text output
bool checkMagicRD (uint64_t n, uint64_t rd) // rd: recursion depth
{
if (n > m_Max_n) m_Max_n = n;
m_TotalRecurse += 1;
m_ssMagic << "\n" << rd << std::setw(static_cast<int>(rd)) << " "
<< " checkMagicRD (" << m_N << ", " << n << ")";
uint64_t nxt;
{
if (n & B00) // n-odd
{
if (n >= m_upperLimitN) // wraparound imminent abort
return ( terminateCheckMagic (nxt, rd, false) );
// NOTE: the sum of 4 odd uint's is even, and will be divided by 2
// we need not wait for the next recursion to divide by 2
nxt = ((n+n+n+ONE) >> ONE); // combine the arithmetic
// ||||||||| ||||||
// sum is even unknown after divide by two
}
else // n-even
{
nxt = (n >> ONE); // bit shift divide by 2
if (nxt < m_N) // nxt < m_N start of recursion
return ( terminateCheckMagic (nxt, rd, true) );
// We need not de-curse to 1 because all
// (n < m_N) are asserted as magic
}
}
return (checkMagicRD (nxt, (rd+1))); // tail recursion
} // bool checkMagicRD(uint64_t, uint64_t rd)
bool terminateCheckMagic (uint64_t n, uint64_t rd, bool rslt)
{
if (rd > m_MaxRcrsDpth) // diag only
{
std::chrono::milliseconds durationMS = getChronoMillisecond();
uint64_t durationSec = durationMS.count() / 1000;
uint64_t durationMSec = durationMS.count() % 1000;
m_MaxRcrsDpth = rd;
// generate noise each update - this does not happen often
static uint64_t count = 0;
std::cerr << "\n\n" << std::setfill(' ')
<< std::setw(30) << "["
<< std::setw(4) << durationSec << "." << std::setfill('0') << std::setw(3)
<< durationMSec << std::setfill(' ')
<< " sec] m_N: " << std::setw(10) << m_N
<< " n: " << std::setw(10) << n
<< " MaxRcrsDpth: " << std::setw(5) << m_MaxRcrsDpth
<< " Max_n: " << std::setw(16) << m_Max_n
<< " (" << count << ")" << std::flush;
count += 1; // how many times MaxRcrsDpth updated
}
m_ssMagic << "\n " << std::setw(3) << rd << std::setw(static_cast<int>(rd)) << " "
<< " " << (rslt ? "True " : ">>>false<<< ") << m_N << ", ";
return (rslt);
}
uint64_t initWrapAroundLimit()
{
uint64_t upLimit = 0;
uint64_t u64MAX = std::numeric_limits<uint64_t>::max();
// there exist a max number, m_upperLimitN
// where 3n+1 < u64MAX, i.e.
upLimit = ((u64MAX - 1) / 3);
return (upLimit);
} // uint64_t initWrapAroundLimit()
public:
int exec (int argc, char* argv[])
{
int retVal = -1;
{ time_t t0 = std::time(nullptr); while(t0 == time(nullptr)) { /* do nothing*/ }; }
// result: consistent run time
switch (argc)
{
case 1: // no parameters: 5 billion tests, 10 blocks, this instance, < 80 sec
{
exec1();
retVal = 0;
}
break;
case 2: // one parameter: 2 threads each runs 5/2 billion tests, in 5 blocks,
{ // two new instances, < 40 sec
// 2 new objects
T431_t t431a(MaxDualCoreMS); // lower test sequence
T431_t t431b(MaxDualCoreMS); // upper test sequence
// 2 threads
std::thread tA (&T431_t::exec2a, &t431a);
std::thread tB (&T431_t::exec2b, &t431b);
// 2 join's
tA.join();
tB.join();
// lower test sequence (tA) output directly to std::cout
// upper test sequence (tB) captured in T431_t::m_ssMagic. send it now
std::cout << t431b.ssMagicShow() << std::endl;
retVal = 0;
}
break;
case 3: // argv[1] -> start address, argv[2] -> length,
{
retVal = exec3 (std::string(argv[1]), std::string(argv[2]));
} break;
default :
{
std::cerr
<< "\nUsage: "
<< "\n argc (= " << argc << ") not expected, should be 0, 1, or 2 parameters."
<< "\n 1 (no parameters) : 5 billion tests, 10 blocks, < 80 sec\n"
//
<< "\n 2 (one parameter) : 2 threads, each 5/2 billion tests, 5 blocks, < 40 sec\n"
//
<< "\n 3 (two parameters): verbose mode: start argv[1], test count argv[2] \n"
//
<< "\n 3.1: \"./dumy431V4 3 1000 1> /tmp/dumy431V4.out2 2> /tmp/dumy431V4.out2a "
<< "\n 3 1 K std::cout redirected std::cerr redirected \n"
//
<< "\n 3.2: \"./dumy431V4 3 1000000000 1> /dev/null 2> /tmp/dumy431V4.out2b "
<< "\n 3 1 B discard std::cout std::cerr redirected \n"
//
<< "\n 3.3: \"./dumy431V4 15 100 "
<< "\n 15 100 (cout and cerr to console) \n"
<< std::endl;
retVal = 0;
}
} // switch
return(retVal);
} // int exec(int argc, char* argv[])
}; // class T431
int main (int argc, char* argv[])
{
std::cout << "argc: " << argc << std::endl;
for (int i=0; i<argc; i+=1) std::cout << argv[i] << " ";
std::cout << std::endl;
setlocale(LC_ALL, "");
std::ios::sync_with_stdio(false);
int retVal;
{
T431_t t431;
retVal = t431.exec(argc, argv);
}
std::cout << "\nFINI " << std::endl;
return(retVal);
}
提交的代码:
a) 使用 assert(x) 进行所有检查(因为 assert 很快并且有
优化器不能忽视的副作用)。
b) 通过使用持续时间断言检测任何无限循环。
c) 断言以防止任何回绕。
当任何断言发生时,程序不会正常终止。
当这段代码正常结束时,意味着:
a) no disproven numbers found, and
b) the running time < 80 seconds, so no endless loop occurred and
c) no wrap-around occurred
关于递归 checkMagicR(uint64_t n) 的注释:
递归中 n 的 3 种形式可用:
形式参数 n - 典型的递归调用
auto var nxt - 接收下一个 n 的计算值
递归调用
数据属性:m_N-当前运行的起点
递归努力
我在猜测优化器可能对这个特定的尾递归进行了哪些更改。
所以我从我的 V9 代码(在我的另一个答案中)创建了一个迭代答案。
更改自:
bool checkMagicR(uint64_t n)
{
//.. replace all
}
更改为:
// ITERATIVE VERSION
bool checkMagicR(uint64_t n) // ITERATIVE Performance
{
do
{ uint64_t nxt;
if (n & B00) // n-odd
{
if (n >= m_upperLimitN) // wraparound imminent abort
return (false); // recursion termination clause.
// NOTE: the sum of 4 odd uint's is even, and will be halved
// we need not wait for the next recursion to divide-by-2
nxt = ((n+n+n+ONE) >> ONE); // combine the arithmetic
// known even
}
else // n-even
{
nxt = (n >> ONE); // bit shift divide by 2
if (nxt < m_N) // nxt < m_N start of recursion
return ( true ); // recursion termination clause.
// We need not de-curse to 1 because all
// (n < m_N) are asserted as magic
}
n = nxt;
} while(true); // NO recursion
}
没有修复名称。未修复此方法或其他方法中对 'recursion' 的注释或其他提及。我更改了此方法顶部和底部的一些注释。
结果:
性能相同。 (44 秒)
模式一和模式二都是迭代的
模式 3(仍然)是递归的。
幻数
一个正整数是“神奇的”,当且仅当它可以通过重复除以 2(如果是偶数)或乘以 3 然后加 1(如果是奇数)来减少到 1。因此,例如,3 是神奇的,因为 3 首先减少到 10 (3*3+1),然后减少到 5 (10/2),然后减少到 16 (5*3+1),然后减少到 8 (16/2) ,然后是 4 (8/2),然后是 2 (4/2),最后是 1 (2/2)。幻数假说指出所有正整数都是幻数,或者,正式地说:∀x ∈ Z, MAGIC(x) 其中 MAGIC(x) 是谓词“x is magic”。
我们应该开发一个 C++ 程序来找到 "Magic Numbers" 从 1 到 50 亿。如果正确完成,应该需要 80 秒或更短的时间。我的大约需要 2 小时,而我们得到的示例程序需要 14 天。这是我的代码,我该怎么做才能加快速度?我是否遗漏了明显的优化问题?
#include <iostream>
using namespace std;
bool checkMagic(unsigned long number);
int main()
{
for(unsigned long i = 1; i <= 5000000000; i++)
{
if(i % 1000000 == 0)
{
//only print out every 1000000 numbers to keep track, but slow it down
//by printing out every single number
cout<<i<<endl;
}
if(!checkMagic(i))
{
//not magic
cout<<i<<" not a magic number"<<endl;
break;
}
}
}
bool checkMagic(unsigned long number)
{
if(number % 2 == 0)
{
number = number / 2;
}
else
{
number = (number * 3) + 1;
}
if(number != 1)
{
checkMagic(number);
}
return 1;
}
这道题主要是想验证Collatz Conjecture up to 5B.
我认为这里的关键是,对于我们正在检查的每个数字n,查看乐观情景和悲观情景,并在恢复到乐观情景之前检查乐观情景悲观者。
在乐观的情况下,我们根据 n / 2 修改 n ; 3n + 1规则,数字序列将:
在有限的步骤中变得小于 n(在这种情况下,我们可以检查我们对那个较小数字的了解)。
不会造成步骤溢出
(正如 TonyK 正确指出的那样,我们不能依赖(甚至不依赖第一个))。
所以,对于乐观的场景,我们可以使用下面的函数:
#include <unordered_set>
#include <set>
#include <iostream>
#include <memory>
#include <list>
#include <gmp.h>
using namespace std;
using found_set = unordered_set<size_t>;
bool fast_verify(size_t i, size_t max_tries, const found_set &found) {
size_t tries = 0;
size_t n = i;
while(n != 1) {
if(++tries == max_tries )
return false;
if(n < i)
return found.empty() || found.find(i) == found.end();
if(n % 2 == 0)
n /= 2;
else if(__builtin_mul_overflow (n, 3, &n) || __builtin_add_overflow(n, 1, &n))
return false;
}
return true;
}
注意以下几点:
该函数仅尝试验证其接收到的数字的猜想。如果是returns
true
,就已经验证过了。如果returnsfalse
,那只是说明它还没有被验证(即不代表它已经被反驳了)。它带一个参数
max_tries
,最多只验证这个步数。如果超过了这个数字,它不会尝试辨别这是否是无限循环的一部分 - 它只是 returns 验证失败。它需要一个
unordered_set
个失败的已知数(当然,如果Collatz猜想成立,这个集合将永远是空的)。它通过
__builtin_*_overflow
检测溢出。 (不幸的是,这是特定于 gcc 的。不同的平台可能需要一组不同的此类函数。)
现在是缓慢但可靠的功能。这个函数使用了GNU MP multi-precision arithmetic library。它通过维护已经遇到的数字序列来检查无限循环。这个函数returns true
if猜想对这个数已经证明,false
if对这个数反驳(注意和之前快速验证的区别)
bool slow_check(size_t i) {
mpz_t n_;
mpz_init(n_);
mpz_t rem_;
mpz_init(rem_);
mpz_t i_;
mpz_init(i_);
mpz_set_ui(i_, i);
mpz_set_ui(n_, i);
list<mpz_t> seen;
while(mpz_cmp_ui(n_, 1) != 0) {
if(mpz_cmp_ui(n_, i) < 0)
return true;
mpz_mod_ui(rem_, n_, 2);
if(mpz_cmp_ui(rem_, 0) == 0) {
mpz_div_ui(n_, n_, 2);
}
else {
mpz_mul_ui(n_, n_, 3);
mpz_add_ui(n_, n_, 1);
}
seen.emplace_back(n_);
for(const auto &e0: seen)
for(const auto &e1: seen)
if(&e0 != &e1 && mpz_cmp(e0, e1) == 0)
return false;
}
return true;
}
最后,main
维护了一个 unordered_set
被证伪的数字。对于每个数字,它乐观地尝试验证猜想,然后,如果它失败(溢出或超过迭代次数),使用慢速方法:
int main()
{
const auto max_num = 5000000000;
found_set found;
for(size_t i = 1; i <= max_num; i++) {
if(i % 1000000000 == 0)
cout << "iteration " << i << endl;
auto const f = fast_verify(i, max_num, found);
if(!f && !slow_check(i))
found.insert(i);
}
for(auto e: found)
cout << e << endl;
}
运行 完整代码(下方)给出:
$ g++ -O3 --std=c++11 magic2.cpp -lgmp && time ./a.out
iteration 1000000000
iteration 2000000000
iteration 3000000000
iteration 4000000000
iteration 5000000000
real 1m3.933s
user 1m3.924s
sys 0m0.000s
$ uname -a
Linux atavory 4.4.0-38-generic #57-Ubuntu SMP Tue Sep 6 15:42:33 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$ sudo lshw | grep -i cpu
*-cpu
description: CPU
product: Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz
bus info: cpu@0
version: Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz
capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts cpufreq
也就是说,没有发现被证实的数字,运行 时间在 ~64 秒。
完整代码:
#include <unordered_set>
#include <set>
#include <iostream>
#include <memory>
#include <list>
#include <gmp.h>
using namespace std;
using found_set = unordered_set<size_t>;
bool fast_verify(size_t i, size_t max_tries, const found_set &found) {
size_t tries = 0;
size_t n = i;
while(n != 1) {
if(++tries == max_tries )
return false;
if(n < i)
return found.empty() || found.find(i) == found.end();
if(n % 2 == 0)
n /= 2;
else if(__builtin_mul_overflow (n, 3, &n) || __builtin_add_overflow(n, 1, &n))
return false;
}
return true;
}
bool slow_check(size_t i) {
mpz_t n_;
mpz_init(n_);
mpz_t rem_;
mpz_init(rem_);
mpz_t i_;
mpz_init(i_);
mpz_set_ui(i_, i);
mpz_set_ui(n_, i);
list<mpz_t> seen;
while(mpz_cmp_ui(n_, 1) != 0) {
if(mpz_cmp_ui(n_, i) < 0)
return true;
mpz_mod_ui(rem_, n_, 2);
if(mpz_cmp_ui(rem_, 0) == 0) {
mpz_div_ui(n_, n_, 2);
}
else {
mpz_mul_ui(n_, n_, 3);
mpz_add_ui(n_, n_, 1);
}
seen.emplace_back(n_);
for(const auto &e0: seen)
for(const auto &e1: seen)
if(&e0 != &e1 && mpz_cmp(e0, e1) == 0)
return false;
}
return true;
}
int main()
{
const auto max_num = 5000000000;
found_set found;
for(size_t i = 1; i <= max_num; i++) {
if(i % 1000000000 == 0)
cout << "iteration " << i << endl;
auto const f = fast_verify(i, max_num, found);
if(!f && !slow_check(i))
found.insert(i);
}
for(auto e: found)
cout << e << endl;
return 0;
}
您的代码和 Ami Tavory 的回答完全忽略了溢出问题。如果您的范围内 是 一个非幻数,那么 Collatz 迭代要么进入循环,要么无限增加。如果它无限增加,它肯定会溢出,不管你的 unsigned long
有多大;如果它进入一个循环,它很可能在到达那里之前溢出。所以你必须在那里进行溢出检查:
#include <limits.h>
...
if (number > (ULONG_MAX - 1) / 3)
PrintAnApologyAndDie() ;
number = (number * 3) + 1;
what can I do to speed it up? Am I missing obvious optimization issues?
- for(unsigned long i = 1; i <= 5000000000; i++)
for 循环不是调用方法 5B 次的最快编码风格。
在我的代码中,请参阅 unwind100() 和 innerLoop()。我编写的 unwind'ing 消除了 99% 的 innerLoop 开销,节省了超过 5 秒(在我的桌面 DD5 上)。您的储蓄可能会有所不同。维基百科有一篇关于展开循环的文章,其中简要描述了节省的潜力。
- if (i % 1000000 == 0)
此代码显然会生成进度报告。耗资50亿 比较,它对幻数检查没有任何作用。
查看我的代码 outerLoop,它调用了 10 次 innerLoop。这提供了一个 每个外循环的 1 个进度报告的方便位置。考虑拆分 你的循环变成 'innerLoop' 和 'outerLoop'。测试 50 亿次是 比向 innerLoops 添加 10 次调用更昂贵。
- 如评论中所述,无论测试结果如何,您的 checkMagic() return 都是“1”。一个简单的错误,但我不知道这个错误是否影响性能。
我认为您的 checkMagic() 没有实现尾递归,这确实会降低您的代码速度。
查看我的方法 checkMagicR(),它具有尾递归,并且 returns 为真或 假的。
此外,在您的 CheckMagic() 中,在奇数输入的子句中,您忽略了当 n 为正数且为奇数时 (3n+1) 必须为偶数的想法。我的代码执行 'early-div-by-2',从而尽可能减少未来的工作量。
- 在我的方法 unwind10() 中,请注意,我的代码通过使用 checkMagicRO()(对于奇数输入)和checkMagicRE()(偶数)。
在我的早期版本中,代码浪费时间检测已知偶数输入是否为偶数,然后将其除以 2,然后 return 为真。 checkMagicRE() 跳过测试和划分,节省时间。
CheckMagicRO() 跳过奇数测试,然后执行奇数并继续进入 checkMagicR()。
- if(number != 1) { checkMagic(number); }
您的递归一直持续到 1 ...没有必要,并且可能对整体性能造成最大的打击。
查看我的 checkMagicR() "recursion termination clause"。我的代码在 ((n/2) < m_N) 时停止,m_N 是启动此递归的测试值。这个想法是所有较小的测试输入都已经被证明是神奇的,否则代码就会断言。当环绕即将到来时它也会停止......这还没有发生。
结果:
我的桌面 (DD5) 更旧、更慢,并且 运行 Ubuntu 15.10 (64)。此代码是使用 g++ v5.2.1 开发的,并且仅在没有超时断言的情况下使用 -O3 完成。
DD5 显然比 Amy Tavory 使用的机器慢 2 倍(比较他的代码 运行 在 DD5 上的表现)。
在 DD5 上,我的代码在模式 1 中完成大约 44 秒,在模式 2 中完成 22 秒。
更快的机器可能会在 11 秒内完成此代码(在模式 2 中)。
完整代码:
// V9
#include <chrono>
#include <iomanip>
#include <iostream>
#include <limits>
#include <sstream>
#include <thread>
#include <vector>
#include <cassert>
// chrono typedef - shorten one long name
typedef std::chrono::high_resolution_clock Clock_t;
class T431_t
{
private:
uint64_t m_N; // start-of-current-recursion
const uint64_t m_upperLimitN; // wrap-around avoidance / prevention
std::stringstream m_ssMagic; // dual threads use separate out streams
uint64_t m_MaxRcrsDpth; // diag only
uint64_t m_Max_n; // diag only
uint64_t m_TotalRecurse; // diag only
Clock_t::time_point m_startMS; // Derived req - progress info
std::stringstream m_ssDuration;
std::ostream* m_so; // dual threads - std::cout or m_ssMagic
uint64_t m_blk;
const int64_t m_MaxThreadDuration; // single Max80KMs, dual Max40KMs
// enum of known type (my preference for 'internal' constants)
enum T431_Constraints : uint64_t
{
ZERO = 0,
B00 = 1, // least-significant-bit is bit 0
ONE = 1, // lsbit set
TWO = 2,
THREE = 3,
//
K = 1000, // instead of 1024,
//
BlkSize = ((K * K * K) / 2), // 0.5 billion
//
MaxSingleCoreMS = (80 * K), // single thread max
MaxDualCoreMS = (40 * K) // dual thread max
};
// convenience method: show both hex and decimal on one line (see dtor)
inline std::string showHexDec(const std::string& label, uint64_t val) {
std::stringstream ss;
ss << "\n" << label
<< " = 0x" << std::hex << std::setfill('0') << std::setw(16) << val
<< " (" << std::dec << std::setfill(' ') << std::setw(22) << val << ")";
return (ss.str());
}
// duration: now() - m_startMS
std::chrono::milliseconds getChronoMillisecond() {
return (std::chrono::duration_cast<std::chrono::milliseconds>
(Clock_t::now() - m_startMS));
}
// reports milliseconds since m_startMS time
void ssDurationPut (const std::string& label)
{
m_ssDuration.str(std::string()); // empty string
m_ssDuration.clear(); // clear ss flags
std::chrono::milliseconds durationMS = getChronoMillisecond();
uint64_t durationSec = (durationMS.count() / 1000);
uint64_t durationMSec = (durationMS.count() % 1000);
m_ssDuration
<< label << std::dec << std::setfill(' ') << std::setw(3) << durationSec
<< "." << std::dec << std::setfill('0') << std::setw(3) << durationMSec
<< " sec";
}
inline void reportProgress()
{
std::chrono::milliseconds durationMS = getChronoMillisecond();
assert(durationMS.count() < m_MaxThreadDuration); // normal finish -> "no endless loop"
//
uint64_t durationSec = (durationMS.count() / 1000);
uint64_t durationMSec = (durationMS.count() % 1000);
*m_so << " m_blk = " << m_blk
<< " m_N = 0x" << std::setfill('0') << std::hex << std::setw(16) << m_N
<< " " << std::dec << std::setfill(' ') << std::setw(3) << durationSec
<< "." << std::dec << std::setfill('0') << std::setw(3) << durationMSec
<< " sec (" << std::dec << std::setfill(' ') << std::setw(10) << m_N << ")"
<< std::endl;
}
public:
T431_t(uint64_t maxDuration = MaxSingleCoreMS) :
m_N (TWO), // start val of current recursion
m_upperLimitN (initWrapAroundLimit()),
// m_ssMagic // default ctor ok - empty
m_MaxRcrsDpth (ZERO),
m_Max_n (TWO),
m_TotalRecurse (ZERO),
m_startMS (Clock_t::now()),
// m_ssDuration // default ctor ok - empty
m_so (&std::cout), // default
m_blk (ZERO),
m_MaxThreadDuration (maxDuration)
{
m_ssMagic.str().reserve(1024*2);
m_ssDuration.str().reserve(256);
}
~T431_t()
{
// m_MaxThreadDuration
// m_blk
// m_so
// m_ssDuration
// m_startMS
// m_Max_n
// m_TotalRecurse
// m_MaxRcrsDpth
if (m_MaxRcrsDpth) // diag only
{
ssDurationPut ("\n duration = " );
std::cerr << "\n T431() dtor "
//
<< showHexDec(" u64MAX",
std::numeric_limits<uint64_t>::max()) << "\n"
//
<< showHexDec(" m_Max_n", m_Max_n)
<< showHexDec(" (3*m_Max_n)+1", ((3*m_Max_n)+1))
<< showHexDec(" upperLimitN", m_upperLimitN)
// ^^^^^^^^^^^ no wrap-around possible when n is less
//
<< "\n"
<< showHexDec(" m_MaxRcrsDpth", m_MaxRcrsDpth)
<< "\n"
<< showHexDec(" m_TotalRecurse", m_TotalRecurse)
//
<< m_ssDuration.str() << std::endl;
}
// m_ssMagic
// m_upperLimitN
// m_N
if(m_MaxThreadDuration == MaxSingleCoreMS)
std::cerr << "\n " << __FILE__ << " by Douglas O. Moen Compiled on = "
<< __DATE__ << " " << __TIME__ << "\n";
}
private:
inline bool checkMagicRE(uint64_t nE) // n is known even, and the recursion starting point
{
return(true); // for unit test, comment out this line to run the asserts
// unit test - enable both asserts
assert(nE == m_N); // confirm recursion start value
assert(ZERO == (nE & B00)); // confirm even
return(true); // nothing more to do
}
inline bool checkMagicRO(uint64_t nO) // n is known odd, and the recursion starting point
{
// unit test - uncomment the following line to measure checkMagicRE() performance
// return (true); // when both methods do nothing - 0.044
// unit test - enable both these asserts
// assert(nO == m_N); // confirm recursion start value
// assert(nO & B00); // confirm odd
uint64_t nxt;
{
assert(nO < m_upperLimitN); // assert that NO wrap-around occurs
// NOTE: the sum of 4 odd uint's is even, and is destined to be
// divided by 2 in the next recursive invocation.
// we need not wait to divide by 2
nxt = ((nO+nO+nO+ONE) >> ONE); // combine the arithmetic
// |||||||||||| ||||||
// sum is even unknown after sum divided by two
}
return (checkMagicR(nxt));
}
inline void unwind8()
{
// skip 0, 1
assert(checkMagicRE (m_N)); m_N++; // 2
assert(checkMagicRO (m_N)); m_N++; // 3
assert(checkMagicRE (m_N)); m_N++; // 4
assert(checkMagicRO (m_N)); m_N++; // 5
assert(checkMagicRE (m_N)); m_N++; // 6
assert(checkMagicRO (m_N)); m_N++; // 7
assert(checkMagicRE (m_N)); m_N++; // 8
assert(checkMagicRO (m_N)); m_N++; // 9
}
inline void unwind10()
{
assert(checkMagicRE (m_N)); m_N++; // 0
assert(checkMagicRO (m_N)); m_N++; // 1
assert(checkMagicRE (m_N)); m_N++; // 2
assert(checkMagicRO (m_N)); m_N++; // 3
assert(checkMagicRE (m_N)); m_N++; // 4
assert(checkMagicRO (m_N)); m_N++; // 5
assert(checkMagicRE (m_N)); m_N++; // 6
assert(checkMagicRO (m_N)); m_N++; // 7
assert(checkMagicRE (m_N)); m_N++; // 8
assert(checkMagicRO (m_N)); m_N++; // 9
}
inline void unwind98()
{
unwind8();
unwind10(); // 1
unwind10(); // 2
unwind10(); // 3
unwind10(); // 4
unwind10(); // 5
unwind10(); // 6
unwind10(); // 7
unwind10(); // 8
unwind10(); // 9
}
inline void unwind100()
{
unwind10(); // 0
unwind10(); // 1
unwind10(); // 2
unwind10(); // 3
unwind10(); // 4
unwind10(); // 5
unwind10(); // 6
unwind10(); // 7
unwind10(); // 8
unwind10(); // 9
}
inline void innerLoop (uint64_t start_N, uint64_t end_N)
{
for (uint64_t iLoop = start_N; iLoop < end_N; iLoop += 100)
{
unwind100();
}
reportProgress();
}
inline void outerLoop(const std::vector<uint64_t>& start_Ns,
const std::vector<uint64_t>& end_Ns)
{
*m_so << "\n outerLoop \n m_upperLimitN = 0x"
<< std::hex << std::setfill('0') << std::setw(16) << m_upperLimitN
<< "\n m_N = 0x" << std::setw(16) << start_Ns[0]
<< " " << std::dec << std::setfill(' ') << std::setw(3) << 0
<< "." << std::dec << std::setfill('0') << std::setw(3) << 0
<< " sec (" << std::dec << std::setfill(' ') << std::setw(10)
<< start_Ns[0] << ")" << std::endl;
size_t szStart = start_Ns.size();
size_t szEnd = end_Ns.size();
assert(szEnd == szStart);
m_blk = 0;
{ // when blk 0 starts at offset 2 (to skip values 0 and 1)
if(TWO == start_Ns[0])
{
m_N = TWO; // set m_N start
unwind98(); // skip 0 and 1
assert(100 == m_N); // confirm
innerLoop (100, end_Ns[m_blk]);
m_blk += 1;
}
else // thread 2 blk 0 starts in the middle of 5 billion range
{
m_N = start_Ns[m_blk]; // set m_N start, do full block
}
}
for (; m_blk < szStart; ++m_blk)
{
innerLoop (start_Ns[m_blk], end_Ns[m_blk]);
}
}
// 1 == argc
void exec1 (void) // no parameters --> check all values
{
const std::vector<uint64_t> start_Ns =
{ TWO, (BlkSize*1), (BlkSize*2), (BlkSize*3), (BlkSize*4),
(BlkSize*5), (BlkSize*6), (BlkSize*7), (BlkSize*8), (BlkSize*9) };
// blk 0 blk 1 blk 2 blk 3 blk 4
// blk 5 blk 6 blk 7 blk 8 blk 9
const std::vector<uint64_t> end_Ns =
{ (BlkSize*1), (BlkSize*2), (BlkSize*3), (BlkSize*4), (BlkSize*5),
(BlkSize*6), (BlkSize*7), (BlkSize*8), (BlkSize*9), (BlkSize*10) };
m_startMS = Clock_t::now();
// outerLoop : 10 blocks generates 10 progressReports()
outerLoop( start_Ns, end_Ns);
ssDurationPut("");
std::cout << " execDuration = " << m_ssDuration.str() << " " << std::endl;
} // void exec1 (void)
void exec2a () // thread entry a
{
//lower half of test range
const std::vector<uint64_t> start_Ns =
{ TWO , (BlkSize*1), (BlkSize*2), (BlkSize*3), (BlkSize*4) };
// blk 0 blk 1 blk 2 blk 3 blk 4
const std::vector<uint64_t> end_Ns =
{ (BlkSize*1), (BlkSize*2), (BlkSize*3), (BlkSize*4), (BlkSize*5) };
m_startMS = Clock_t::now();
// m_so to std::cout
// uses 5 blocks to gen 5 progress indicators
outerLoop (start_Ns, end_Ns);
ssDurationPut("");
std::cout << " execDuration = " << m_ssDuration.str() << " (a) " << std::endl;
} // exec2a (void)
void exec2b () // thread entry b
{
// upper half of test range
const std::vector<uint64_t> start_Ns =
{ (BlkSize*5), (BlkSize*6), (BlkSize*7), (BlkSize*8), (BlkSize*9) };
// blk 5 blk 6 blk 7 blk 8 blk 9
const std::vector<uint64_t> end_Ns =
{ (BlkSize*6), (BlkSize*7), (BlkSize*8), (BlkSize*9), (BlkSize*10) };
m_startMS = Clock_t::now();
m_so = &m_ssMagic;
// uses 5 blocks to gen 5 progress indicators
outerLoop(start_Ns, end_Ns);
ssDurationPut("");
m_ssMagic << " execDuration = " << m_ssDuration.str() << " (b) " << std::endl;
} // exec2b (void)
// 3 == argc
int exec3 (std::string argv1, // recursion start value
std::string argv2) // values count
{
uint64_t start_N = std::stoull(argv1);
uint64_t length = std::stoull(argv2);
// range checks
{
std::string note1;
switch (start_N) // limit check
{
case 0: { start_N = 3; length -= 3;
note1 = "... 0 is below test range (change start to 3 ";
} break;
case 1: { start_N = 3; length -= 2;
note1 = "... 1 is defined magic (change start to 3";
} break;
case 2: { start_N = 3; length -= 1;
note1 = "... 2 is trivially magic (change start to 3";
} break;
default:
{ // when start_N from user is even
if (ZERO == (start_N & B00))
{
start_N -= 1;
note1 = "... even is not an interesting starting point";
length += 1;
}
}
break;
} // switch (start_N)
// start_N not restrained ... allow any manual search
// but uin64_t wrap-around still preempted
std::cout << "\n start_N: " << start_N << " " << note1
<< "\n length: " << length << " " << std::endl;
}
m_startMS = Clock_t::now();
for (m_N = start_N; m_N < (start_N + length); ++m_N)
{
bool magicNumberFound = checkMagicRD (m_N, 1);
std::cout << m_ssMagic.str() << std::dec << m_N
<< " m_MaxRcrsDpth: " << m_MaxRcrsDpth << ";"
<< " m_Max_n: " << m_Max_n << ";" << std::endl;
m_ssMagic.str(std::string()); // empty string
m_ssMagic.clear(); // clear flags)
if (!magicNumberFound)
{
std::cerr << m_N << " not a magic number!" << std::endl;
break;
}
} // for (uint64_t k = kStart; k < kEnd; ++k)
ssDurationPut("");
std::cout << " execDuration = " << m_ssDuration.str() << " " << std::endl;
return(0);
} // int exec3 (std::string argv1, std::string argv2)
// conversion
std::string ssMagicShow() { return (m_ssMagic.str()); }
// D3: use recursion for closer match to definition
bool checkMagicR(uint64_t n) // recursive Performance
{
uint64_t nxt;
{
if (n & B00) // n-odd
{
if (n >= m_upperLimitN) // wraparound imminent abort
return (false); // recursion termination clause
// NOTE: the sum of 4 odd uint's is even, and will be halved
// we need not wait for the next recursion to divide-by-2
nxt = ((n+n+n+ONE) >> ONE); // combine the arithmetic
// known even
}
else // n-even
{
nxt = (n >> ONE); // bit shift divide by 2
if (nxt < m_N) // nxt < m_N start of recursion
return ( true ); // recursion termination clause
// We need not de-curse to 1 because all
// (n < m_N) are asserted as magic
}
}
return (checkMagicR(nxt)); // tail recursion
} // bool checkMagicR(uint64_t n)
// D4: diagnostic text output
bool checkMagicRD (uint64_t n, uint64_t rd) // rd: recursion depth
{
if (n > m_Max_n) m_Max_n = n;
m_TotalRecurse += 1;
m_ssMagic << "\n" << rd << std::setw(static_cast<int>(rd)) << " "
<< " checkMagicRD (" << m_N << ", " << n << ")";
uint64_t nxt;
{
if (n & B00) // n-odd
{
if (n >= m_upperLimitN) // wraparound imminent abort
return ( terminateCheckMagic (nxt, rd, false) );
// NOTE: the sum of 4 odd uint's is even, and will be divided by 2
// we need not wait for the next recursion to divide by 2
nxt = ((n+n+n+ONE) >> ONE); // combine the arithmetic
// ||||||||| ||||||
// sum is even unknown after divide by two
}
else // n-even
{
nxt = (n >> ONE); // bit shift divide by 2
if (nxt < m_N) // nxt < m_N start of recursion
return ( terminateCheckMagic (nxt, rd, true) );
// We need not de-curse to 1 because all
// (n < m_N) are asserted as magic
}
}
return (checkMagicRD (nxt, (rd+1))); // tail recursion
} // bool checkMagicRD(uint64_t, uint64_t rd)
bool terminateCheckMagic (uint64_t n, uint64_t rd, bool rslt)
{
if (rd > m_MaxRcrsDpth) // diag only
{
std::chrono::milliseconds durationMS = getChronoMillisecond();
uint64_t durationSec = durationMS.count() / 1000;
uint64_t durationMSec = durationMS.count() % 1000;
m_MaxRcrsDpth = rd;
// generate noise each update - this does not happen often
static uint64_t count = 0;
std::cerr << "\n\n" << std::setfill(' ')
<< std::setw(30) << "["
<< std::setw(4) << durationSec << "." << std::setfill('0') << std::setw(3)
<< durationMSec << std::setfill(' ')
<< " sec] m_N: " << std::setw(10) << m_N
<< " n: " << std::setw(10) << n
<< " MaxRcrsDpth: " << std::setw(5) << m_MaxRcrsDpth
<< " Max_n: " << std::setw(16) << m_Max_n
<< " (" << count << ")" << std::flush;
count += 1; // how many times MaxRcrsDpth updated
}
m_ssMagic << "\n " << std::setw(3) << rd << std::setw(static_cast<int>(rd)) << " "
<< " " << (rslt ? "True " : ">>>false<<< ") << m_N << ", ";
return (rslt);
}
uint64_t initWrapAroundLimit()
{
uint64_t upLimit = 0;
uint64_t u64MAX = std::numeric_limits<uint64_t>::max();
// there exist a max number, m_upperLimitN
// where 3n+1 < u64MAX, i.e.
upLimit = ((u64MAX - 1) / 3);
return (upLimit);
} // uint64_t initWrapAroundLimit()
public:
int exec (int argc, char* argv[])
{
int retVal = -1;
{ time_t t0 = std::time(nullptr); while(t0 == time(nullptr)) { /* do nothing*/ }; }
// result: consistent run time
switch (argc)
{
case 1: // no parameters: 5 billion tests, 10 blocks, this instance, < 80 sec
{
exec1();
retVal = 0;
}
break;
case 2: // one parameter: 2 threads each runs 5/2 billion tests, in 5 blocks,
{ // two new instances, < 40 sec
// 2 new objects
T431_t t431a(MaxDualCoreMS); // lower test sequence
T431_t t431b(MaxDualCoreMS); // upper test sequence
// 2 threads
std::thread tA (&T431_t::exec2a, &t431a);
std::thread tB (&T431_t::exec2b, &t431b);
// 2 join's
tA.join();
tB.join();
// lower test sequence (tA) output directly to std::cout
// upper test sequence (tB) captured in T431_t::m_ssMagic. send it now
std::cout << t431b.ssMagicShow() << std::endl;
retVal = 0;
}
break;
case 3: // argv[1] -> start address, argv[2] -> length,
{
retVal = exec3 (std::string(argv[1]), std::string(argv[2]));
} break;
default :
{
std::cerr
<< "\nUsage: "
<< "\n argc (= " << argc << ") not expected, should be 0, 1, or 2 parameters."
<< "\n 1 (no parameters) : 5 billion tests, 10 blocks, < 80 sec\n"
//
<< "\n 2 (one parameter) : 2 threads, each 5/2 billion tests, 5 blocks, < 40 sec\n"
//
<< "\n 3 (two parameters): verbose mode: start argv[1], test count argv[2] \n"
//
<< "\n 3.1: \"./dumy431V4 3 1000 1> /tmp/dumy431V4.out2 2> /tmp/dumy431V4.out2a "
<< "\n 3 1 K std::cout redirected std::cerr redirected \n"
//
<< "\n 3.2: \"./dumy431V4 3 1000000000 1> /dev/null 2> /tmp/dumy431V4.out2b "
<< "\n 3 1 B discard std::cout std::cerr redirected \n"
//
<< "\n 3.3: \"./dumy431V4 15 100 "
<< "\n 15 100 (cout and cerr to console) \n"
<< std::endl;
retVal = 0;
}
} // switch
return(retVal);
} // int exec(int argc, char* argv[])
}; // class T431
int main (int argc, char* argv[])
{
std::cout << "argc: " << argc << std::endl;
for (int i=0; i<argc; i+=1) std::cout << argv[i] << " ";
std::cout << std::endl;
setlocale(LC_ALL, "");
std::ios::sync_with_stdio(false);
int retVal;
{
T431_t t431;
retVal = t431.exec(argc, argv);
}
std::cout << "\nFINI " << std::endl;
return(retVal);
}
提交的代码:
a) 使用 assert(x) 进行所有检查(因为 assert 很快并且有 优化器不能忽视的副作用)。
b) 通过使用持续时间断言检测任何无限循环。
c) 断言以防止任何回绕。
当任何断言发生时,程序不会正常终止。
当这段代码正常结束时,意味着:
a) no disproven numbers found, and
b) the running time < 80 seconds, so no endless loop occurred and
c) no wrap-around occurred
关于递归 checkMagicR(uint64_t n) 的注释:
递归中 n 的 3 种形式可用:
形式参数 n - 典型的递归调用
auto var nxt - 接收下一个 n 的计算值 递归调用
数据属性:m_N-当前运行的起点 递归努力
我在猜测优化器可能对这个特定的尾递归进行了哪些更改。
所以我从我的 V9 代码(在我的另一个答案中)创建了一个迭代答案。
更改自:
bool checkMagicR(uint64_t n)
{
//.. replace all
}
更改为:
// ITERATIVE VERSION
bool checkMagicR(uint64_t n) // ITERATIVE Performance
{
do
{ uint64_t nxt;
if (n & B00) // n-odd
{
if (n >= m_upperLimitN) // wraparound imminent abort
return (false); // recursion termination clause.
// NOTE: the sum of 4 odd uint's is even, and will be halved
// we need not wait for the next recursion to divide-by-2
nxt = ((n+n+n+ONE) >> ONE); // combine the arithmetic
// known even
}
else // n-even
{
nxt = (n >> ONE); // bit shift divide by 2
if (nxt < m_N) // nxt < m_N start of recursion
return ( true ); // recursion termination clause.
// We need not de-curse to 1 because all
// (n < m_N) are asserted as magic
}
n = nxt;
} while(true); // NO recursion
}
没有修复名称。未修复此方法或其他方法中对 'recursion' 的注释或其他提及。我更改了此方法顶部和底部的一些注释。
结果:
性能相同。 (44 秒)
模式一和模式二都是迭代的
模式 3(仍然)是递归的。