使用 -O3 的两个等效函数的公平计时

Fair timing of two equivalent functions with -O3

一位同事给我发了一些有趣的代码,用于计算整数的 LSB 索引:

unsigned int v;  // the input number
int r;           // result goes here
static const int MultiplyDeBruijnBitPosition[32] = 
{
  0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8, 
  31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
r = MultiplyDeBruijnBitPosition[((uint32_t)((v & -v) * 0x077CB531U)) >> 27];

我很好奇我是否能打败它(有一个等效的函数希望 运行s 更快)并且最初我写了这段代码来测试它:

#include <ctime>
#include <iomanip>
#include <iostream>

//-----------------------------------------------------------------------------------------------------------------------
unsigned int alexIdea(unsigned int i)
{
    switch (i & -i)
    {
    case 0x0u:
    case 0x1u:
        return 0;
    case 0x2u:
        return 1;
    case 0x4u:
        return 2;
    case 0x8u:
        return 3;
    case 0x10u:
        return 4;
    case 0x20u:
        return 5;
    case 0x40u:
        return 6;
    case 0x80u:
        return 7;
    case 0x100u:
        return 8;
    case 0x200u:
        return 9;
    case 0x400u:
        return 10;
    case 0x800u:
        return 11;
    case 0x1000u:
        return 12;
    case 0x2000u:
        return 13;
    case 0x4000u:
        return 14;
    case 0x8000u:
        return 15;
    case 0x10000u:
        return 16;
    case 0x20000u:
        return 17;
    case 0x40000u:
        return 18;
    case 0x80000u:
        return 19;
    case 0x100000u:
        return 20;
    case 0x200000u:
        return 21;
    case 0x400000u:
        return 22;
    case 0x800000u:
        return 23;
    case 0x1000000u:
        return 24;
    case 0x2000000u:
        return 25;
    case 0x4000000u:
        return 26;
    case 0x8000000u:
        return 27;
    case 0x10000000u:
        return 28;
    case 0x20000000u:
        return 29;
    case 0x40000000u:
        return 30;
    case 0x80000000u:
        return 31;
    }

    return 0;
}

//-----------------------------------------------------------------------------------------------------------------------
const int multiplyDeBruijnBitPosition[32] = {
    0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8, 31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};

unsigned int strangeThing(unsigned int i)
{
    return multiplyDeBruijnBitPosition[((unsigned int)((i & -i) * 0x077cb531u)) >> 27];
}

//-----------------------------------------------------------------------------------------------------------------------
int main()
{
    const unsigned int NUM_ITERATIONS = 1000000000;

    //-------------------------------------------------------------------------------------------------------------------
    // Compare Alex's idea to the strange thing
    //-------------------------------------------------------------------------------------------------------------------
    bool equivalent = true;
    for (unsigned int i = 0; i < NUM_ITERATIONS; ++i)
        if (alexIdea(i) != strangeThing(i)) {
            equivalent = false;
            break;
        }
    std::cout << (equivalent ? "Equivalent functions" : "Alex screwed up") << std::endl;


    //-------------------------------------------------------------------------------------------------------------------
    // See if Alex's idea is faster than the strange thing
    //-------------------------------------------------------------------------------------------------------------------
    clock_t start;

    start = clock();
    for (unsigned int i = 0; i < NUM_ITERATIONS; ++i)
        alexIdea(i);
    std::cout << std::setprecision(51) << "Alex's idea time:     " << (double(clock() - start) / CLOCKS_PER_SEC) << std::endl;

    start = clock();
    for (unsigned int i = 0; i < NUM_ITERATIONS; ++i)
        strangeThing(i);
    std::cout << std::setprecision(51) << "Strange thing's time: " << (double(clock() - start) / CLOCKS_PER_SEC) << std::endl;

    return 0;
}

我得到了一些奇怪的结果,似乎奇怪的功能快得不可思议:

Alexanders-MBP:Desktop alexandersimes$ !g
g++ foo.cpp -O3
Alexanders-MBP:Desktop alexandersimes$ ./a.out
Equivalent functions
Alex's idea time:     9.99999999999999954748111825886258685613938723690808e-07
Strange thing's time: 0
Alexanders-MBP:Desktop alexandersimes$ ./a.out
Equivalent functions
Alex's idea time:     1.99999999999999990949622365177251737122787744738162e-06
Strange thing's time: 9.99999999999999954748111825886258685613938723690808e-07
Alexanders-MBP:Desktop alexandersimes$ ./a.out
Equivalent functions
Alex's idea time:     3.00000000000000007600257229123386082392244134098291e-06
Strange thing's time: 9.99999999999999954748111825886258685613938723690808e-07

我尝试添加条件(在 运行 时间内不会成立)以尝试强制计算两者。 main的计时部分改为:

//-------------------------------------------------------------------------------------------------------------------
// See if Alex's idea is faster than the strange thing
//-------------------------------------------------------------------------------------------------------------------
clock_t start;

start = clock();
for (unsigned int i = 0; i < NUM_ITERATIONS; ++i)
    if (alexIdea(i) == 31) // This will never happen, but don't optimize out the calculation
        std::cout << "This will never happen" << std::endl;
std::cout << std::fixed << std::setprecision(8)
          << "Alex's idea time:     " << (double(clock() - start) / CLOCKS_PER_SEC) << std::endl;

start = clock();
for (unsigned int i = 0; i < NUM_ITERATIONS; ++i)
    if (strangeThing(i) == 31)  // This will never happen, but don't optimize out the calculation
        std::cout << "This will never happen" << std::endl;
std::cout << std::fixed << std::setprecision(8)
          << "Strange thing's time: " << (double(clock() - start) / CLOCKS_PER_SEC) << std::endl;

导致:

Alexanders-MBP:Desktop alexandersimes$ ./a.out
Equivalent functions
Alex's idea time:     0.00000200
Strange thing's time: 1.53144700
Alexanders-MBP:Desktop alexandersimes$ ./a.out
Equivalent functions
Alex's idea time:     0.00000300
Strange thing's time: 1.44950600
Alexanders-MBP:Desktop alexandersimes$ ./a.out
Equivalent functions
Alex's idea time:     0.00000200
Strange thing's time: 1.57773800

第二次计时有效吗?有没有更好的方法来计时功能?

创建一个变量 dummy 并将其初始化为 argc。然后在每个循环中通过函数的 return 递增 dummy。然后returndummy。这应该会阻止编译器省略函数调用。