使用 -O3 的两个等效函数的公平计时
Fair timing of two equivalent functions with -O3
一位同事给我发了一些有趣的代码,用于计算整数的 LSB 索引:
unsigned int v; // the input number
int r; // result goes here
static const int MultiplyDeBruijnBitPosition[32] =
{
0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
r = MultiplyDeBruijnBitPosition[((uint32_t)((v & -v) * 0x077CB531U)) >> 27];
我很好奇我是否能打败它(有一个等效的函数希望 运行s 更快)并且最初我写了这段代码来测试它:
#include <ctime>
#include <iomanip>
#include <iostream>
//-----------------------------------------------------------------------------------------------------------------------
unsigned int alexIdea(unsigned int i)
{
switch (i & -i)
{
case 0x0u:
case 0x1u:
return 0;
case 0x2u:
return 1;
case 0x4u:
return 2;
case 0x8u:
return 3;
case 0x10u:
return 4;
case 0x20u:
return 5;
case 0x40u:
return 6;
case 0x80u:
return 7;
case 0x100u:
return 8;
case 0x200u:
return 9;
case 0x400u:
return 10;
case 0x800u:
return 11;
case 0x1000u:
return 12;
case 0x2000u:
return 13;
case 0x4000u:
return 14;
case 0x8000u:
return 15;
case 0x10000u:
return 16;
case 0x20000u:
return 17;
case 0x40000u:
return 18;
case 0x80000u:
return 19;
case 0x100000u:
return 20;
case 0x200000u:
return 21;
case 0x400000u:
return 22;
case 0x800000u:
return 23;
case 0x1000000u:
return 24;
case 0x2000000u:
return 25;
case 0x4000000u:
return 26;
case 0x8000000u:
return 27;
case 0x10000000u:
return 28;
case 0x20000000u:
return 29;
case 0x40000000u:
return 30;
case 0x80000000u:
return 31;
}
return 0;
}
//-----------------------------------------------------------------------------------------------------------------------
const int multiplyDeBruijnBitPosition[32] = {
0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8, 31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
unsigned int strangeThing(unsigned int i)
{
return multiplyDeBruijnBitPosition[((unsigned int)((i & -i) * 0x077cb531u)) >> 27];
}
//-----------------------------------------------------------------------------------------------------------------------
int main()
{
const unsigned int NUM_ITERATIONS = 1000000000;
//-------------------------------------------------------------------------------------------------------------------
// Compare Alex's idea to the strange thing
//-------------------------------------------------------------------------------------------------------------------
bool equivalent = true;
for (unsigned int i = 0; i < NUM_ITERATIONS; ++i)
if (alexIdea(i) != strangeThing(i)) {
equivalent = false;
break;
}
std::cout << (equivalent ? "Equivalent functions" : "Alex screwed up") << std::endl;
//-------------------------------------------------------------------------------------------------------------------
// See if Alex's idea is faster than the strange thing
//-------------------------------------------------------------------------------------------------------------------
clock_t start;
start = clock();
for (unsigned int i = 0; i < NUM_ITERATIONS; ++i)
alexIdea(i);
std::cout << std::setprecision(51) << "Alex's idea time: " << (double(clock() - start) / CLOCKS_PER_SEC) << std::endl;
start = clock();
for (unsigned int i = 0; i < NUM_ITERATIONS; ++i)
strangeThing(i);
std::cout << std::setprecision(51) << "Strange thing's time: " << (double(clock() - start) / CLOCKS_PER_SEC) << std::endl;
return 0;
}
我得到了一些奇怪的结果,似乎奇怪的功能快得不可思议:
Alexanders-MBP:Desktop alexandersimes$ !g
g++ foo.cpp -O3
Alexanders-MBP:Desktop alexandersimes$ ./a.out
Equivalent functions
Alex's idea time: 9.99999999999999954748111825886258685613938723690808e-07
Strange thing's time: 0
Alexanders-MBP:Desktop alexandersimes$ ./a.out
Equivalent functions
Alex's idea time: 1.99999999999999990949622365177251737122787744738162e-06
Strange thing's time: 9.99999999999999954748111825886258685613938723690808e-07
Alexanders-MBP:Desktop alexandersimes$ ./a.out
Equivalent functions
Alex's idea time: 3.00000000000000007600257229123386082392244134098291e-06
Strange thing's time: 9.99999999999999954748111825886258685613938723690808e-07
我尝试添加条件(在 运行 时间内不会成立)以尝试强制计算两者。 main的计时部分改为:
//-------------------------------------------------------------------------------------------------------------------
// See if Alex's idea is faster than the strange thing
//-------------------------------------------------------------------------------------------------------------------
clock_t start;
start = clock();
for (unsigned int i = 0; i < NUM_ITERATIONS; ++i)
if (alexIdea(i) == 31) // This will never happen, but don't optimize out the calculation
std::cout << "This will never happen" << std::endl;
std::cout << std::fixed << std::setprecision(8)
<< "Alex's idea time: " << (double(clock() - start) / CLOCKS_PER_SEC) << std::endl;
start = clock();
for (unsigned int i = 0; i < NUM_ITERATIONS; ++i)
if (strangeThing(i) == 31) // This will never happen, but don't optimize out the calculation
std::cout << "This will never happen" << std::endl;
std::cout << std::fixed << std::setprecision(8)
<< "Strange thing's time: " << (double(clock() - start) / CLOCKS_PER_SEC) << std::endl;
导致:
Alexanders-MBP:Desktop alexandersimes$ ./a.out
Equivalent functions
Alex's idea time: 0.00000200
Strange thing's time: 1.53144700
Alexanders-MBP:Desktop alexandersimes$ ./a.out
Equivalent functions
Alex's idea time: 0.00000300
Strange thing's time: 1.44950600
Alexanders-MBP:Desktop alexandersimes$ ./a.out
Equivalent functions
Alex's idea time: 0.00000200
Strange thing's time: 1.57773800
第二次计时有效吗?有没有更好的方法来计时功能?
创建一个变量 dummy
并将其初始化为 argc
。然后在每个循环中通过函数的 return 递增 dummy
。然后returndummy
。这应该会阻止编译器省略函数调用。
一位同事给我发了一些有趣的代码,用于计算整数的 LSB 索引:
unsigned int v; // the input number
int r; // result goes here
static const int MultiplyDeBruijnBitPosition[32] =
{
0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
r = MultiplyDeBruijnBitPosition[((uint32_t)((v & -v) * 0x077CB531U)) >> 27];
我很好奇我是否能打败它(有一个等效的函数希望 运行s 更快)并且最初我写了这段代码来测试它:
#include <ctime>
#include <iomanip>
#include <iostream>
//-----------------------------------------------------------------------------------------------------------------------
unsigned int alexIdea(unsigned int i)
{
switch (i & -i)
{
case 0x0u:
case 0x1u:
return 0;
case 0x2u:
return 1;
case 0x4u:
return 2;
case 0x8u:
return 3;
case 0x10u:
return 4;
case 0x20u:
return 5;
case 0x40u:
return 6;
case 0x80u:
return 7;
case 0x100u:
return 8;
case 0x200u:
return 9;
case 0x400u:
return 10;
case 0x800u:
return 11;
case 0x1000u:
return 12;
case 0x2000u:
return 13;
case 0x4000u:
return 14;
case 0x8000u:
return 15;
case 0x10000u:
return 16;
case 0x20000u:
return 17;
case 0x40000u:
return 18;
case 0x80000u:
return 19;
case 0x100000u:
return 20;
case 0x200000u:
return 21;
case 0x400000u:
return 22;
case 0x800000u:
return 23;
case 0x1000000u:
return 24;
case 0x2000000u:
return 25;
case 0x4000000u:
return 26;
case 0x8000000u:
return 27;
case 0x10000000u:
return 28;
case 0x20000000u:
return 29;
case 0x40000000u:
return 30;
case 0x80000000u:
return 31;
}
return 0;
}
//-----------------------------------------------------------------------------------------------------------------------
const int multiplyDeBruijnBitPosition[32] = {
0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8, 31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
unsigned int strangeThing(unsigned int i)
{
return multiplyDeBruijnBitPosition[((unsigned int)((i & -i) * 0x077cb531u)) >> 27];
}
//-----------------------------------------------------------------------------------------------------------------------
int main()
{
const unsigned int NUM_ITERATIONS = 1000000000;
//-------------------------------------------------------------------------------------------------------------------
// Compare Alex's idea to the strange thing
//-------------------------------------------------------------------------------------------------------------------
bool equivalent = true;
for (unsigned int i = 0; i < NUM_ITERATIONS; ++i)
if (alexIdea(i) != strangeThing(i)) {
equivalent = false;
break;
}
std::cout << (equivalent ? "Equivalent functions" : "Alex screwed up") << std::endl;
//-------------------------------------------------------------------------------------------------------------------
// See if Alex's idea is faster than the strange thing
//-------------------------------------------------------------------------------------------------------------------
clock_t start;
start = clock();
for (unsigned int i = 0; i < NUM_ITERATIONS; ++i)
alexIdea(i);
std::cout << std::setprecision(51) << "Alex's idea time: " << (double(clock() - start) / CLOCKS_PER_SEC) << std::endl;
start = clock();
for (unsigned int i = 0; i < NUM_ITERATIONS; ++i)
strangeThing(i);
std::cout << std::setprecision(51) << "Strange thing's time: " << (double(clock() - start) / CLOCKS_PER_SEC) << std::endl;
return 0;
}
我得到了一些奇怪的结果,似乎奇怪的功能快得不可思议:
Alexanders-MBP:Desktop alexandersimes$ !g
g++ foo.cpp -O3
Alexanders-MBP:Desktop alexandersimes$ ./a.out
Equivalent functions
Alex's idea time: 9.99999999999999954748111825886258685613938723690808e-07
Strange thing's time: 0
Alexanders-MBP:Desktop alexandersimes$ ./a.out
Equivalent functions
Alex's idea time: 1.99999999999999990949622365177251737122787744738162e-06
Strange thing's time: 9.99999999999999954748111825886258685613938723690808e-07
Alexanders-MBP:Desktop alexandersimes$ ./a.out
Equivalent functions
Alex's idea time: 3.00000000000000007600257229123386082392244134098291e-06
Strange thing's time: 9.99999999999999954748111825886258685613938723690808e-07
我尝试添加条件(在 运行 时间内不会成立)以尝试强制计算两者。 main的计时部分改为:
//-------------------------------------------------------------------------------------------------------------------
// See if Alex's idea is faster than the strange thing
//-------------------------------------------------------------------------------------------------------------------
clock_t start;
start = clock();
for (unsigned int i = 0; i < NUM_ITERATIONS; ++i)
if (alexIdea(i) == 31) // This will never happen, but don't optimize out the calculation
std::cout << "This will never happen" << std::endl;
std::cout << std::fixed << std::setprecision(8)
<< "Alex's idea time: " << (double(clock() - start) / CLOCKS_PER_SEC) << std::endl;
start = clock();
for (unsigned int i = 0; i < NUM_ITERATIONS; ++i)
if (strangeThing(i) == 31) // This will never happen, but don't optimize out the calculation
std::cout << "This will never happen" << std::endl;
std::cout << std::fixed << std::setprecision(8)
<< "Strange thing's time: " << (double(clock() - start) / CLOCKS_PER_SEC) << std::endl;
导致:
Alexanders-MBP:Desktop alexandersimes$ ./a.out
Equivalent functions
Alex's idea time: 0.00000200
Strange thing's time: 1.53144700
Alexanders-MBP:Desktop alexandersimes$ ./a.out
Equivalent functions
Alex's idea time: 0.00000300
Strange thing's time: 1.44950600
Alexanders-MBP:Desktop alexandersimes$ ./a.out
Equivalent functions
Alex's idea time: 0.00000200
Strange thing's time: 1.57773800
第二次计时有效吗?有没有更好的方法来计时功能?
创建一个变量 dummy
并将其初始化为 argc
。然后在每个循环中通过函数的 return 递增 dummy
。然后returndummy
。这应该会阻止编译器省略函数调用。