如何解读perf的报告

Question

我正在学习如何使用工具 perf 来分析我的 C++ 项目。这是我的代码：

#include <iostream>
#include <thread>
#include <mutex>
#include <vector>


std::mutex mtx;
long long_val = 0;

void do_something(long &val)
{
    std::unique_lock<std::mutex> lck(mtx);
    for(int j=0; j<1000; ++j)
        val++;
}


void thread_func()
{
    for(int i=0; i<1000000L; ++i)
    {
        do_something(long_val);
    }
}


int main(int argc, char* argv[])
{
    std::vector<std::unique_ptr<std::thread>> threads;
    for(int i=0; i<100; ++i)
    {
        threads.push_back(std::move(std::unique_ptr<std::thread>(new std::thread(thread_func))));
    }
    for(int i=0; i<100; ++i)
    {
        threads[i]->join();
    }
    threads.clear();
    std::cout << long_val << std::endl;
    return 0;
}

为了编译它，我运行 g++ -std=c++11 main.cpp -lpthread -g 然后我得到名为 a.out.

的可执行文件

然后我运行perf record --call-graph dwarf -- ./a.out等待10秒，然后我按Ctrl+c中断./a.out，因为它需要太多时间来执行。

最后，我运行 perf report -g graph --no-children 这是输出：

我的目标是找出代码的哪一部分最重。所以看起来这个输出可以告诉我 do_something 是最重的部分（46.25%）。但是当我进入do_something时，我无法理解它是什么：std::_Bind_simple、std::thread::_Impl等

那么如何从perf report的输出中得到更有用的信息呢？或者除了 do_something 是最重的事实之外我们不能得到更多？

Answer 1

这里的问题是您的互斥锁相互等待，迫使您的程序经常命中调度程序。

如果使用更少的线程，您将获得更好的性能。

Answer 2

在@Peter Cordes 的帮助下，我提出了这个答案。如果你有更好的东西，请随时提出你的答案。

You forgot to enable optimization at all when you compiled, so all the little functions that should normally inline away are actually getting called. Add -O3 or at least -O2 to your g++ command line. Optionally also profile-guided optimization if you really want gcc to do a good job on hot loops.

加入-O3后，perf report的输出变为：

现在我们可以从 futex_wake 和 futex_wait_setup 中得到一些有用的东西，因为我们应该知道 C++11 中的 mutex 是由 [=27 的 futex 实现的=].所以结果是 mutex 是这段代码中的热点。

如何解读perf的报告

How to interpret the report of perf

c++

c++11

profiler

perf