函数的性能分析

Question

我必须在我的程序中调用一个函数 100000 次。如果传递给函数的参数是 4 那么它 returns 5 如果传递的参数是 5 它 returns 4. 我有以下两个函数执行相同的任务，一个使用减法，另一个使用除法运算。我需要最快的代码。哪个更好，为什么。

int function1( int x)
{
   return 9-x;
}

int function2(int x )
{
   return 20/x;
}

Answer 1

使用位操作。

甜蜜而简单

int x=5;
cout<< (x^1);  //output 4

int x=4;
cout<< (x^1);  //output 5

你可以在这里查看http://ideone.com/tA1SrI

Answer 2

这取决于（或可能取决于）许多因素：

当您得到除 4 或 5 之外的其他参数时，您的代码应该做什么
你的机器，编译器，优化，...

无论如何，检查哪个更高效的最简单方法是添加一个小计时器，并多次执行每个功能；像这样：

struct timeval t1, t2;
gettimeofday(&t1, NULL);

int i, a;
for (i = 0; i < 100000; i++)
    {
    a += function1(4);
    }

gettimeofday(&t2, NULL);
printf("Time elapsed: %li µs\n", (t2.tv_sec - t1.tv_sec) * 1000000 + t2.tv_usec - t1.tv_usec);

警告如果您使用代码优化（gcc -O2 或类似的东西），它可能会优化掉未使用的变量。这就是我这样做的原因：a +=...所以编译器认为输出被使用...

Answer 3

当您想了解性能时，请进行测量。我做到了：

#include <cstdlib>
#include <iostream>

int main(int argc, char* argv[])
{
  int total = 0;
  int rep = atoi(argv[1]);
  for (int ii = 0; ii < rep; ++ii)
  {
    total += function1(4 + ii % 2);
  }

  std::cout << total << '\n';
}

此程序交替将 4 和 5 传递给函数，特别注意避免迭代次数的编译时优化或整个计算的省略。下面是结果，是在一个非常普通的 Intel x86_64 Linux 盒子上使用 g++ -O3 -g -Wall、valgrind --tool=callgrind 和最后的 kcachegrind 计算得出的 "Instruction read" ] 计入 main():

9 - x: 345K
20/ x: 1320K
x ^ 1: 345K

如果您更喜欢墙时间，这些是：

9 - x: 0.48s
20/ x: 4.7s
x ^ 1: 0.48s

考虑到所有这些，坚持使用减法版本。

最后，关于 "Instruction read" 在 valgrind 统计数据中的含义（感谢 "Iwillnotexist Idonotexist" 的评论）：

Instruction read is equal to the number of instructions fetched (and executed). This statistic breaks down and gives misleading results precisely around ultra-long instructions like division, and moreover ignores the processor's potential for superscalar execution and pipeline hazards.

函数的性能分析

Performance Analysis Of A Function

c++

performance