perf 报告单次睡眠的多个 sched:sched_stat_sleep 事件

perf reports multiple sched:sched_stat_sleep event for a single sleep

我正在尝试分析示例应用程序

#include<stdio.h>
#include<unistd.h>

int calculate() {
        int i=0, ret;
        for(i=0;i<10000000;i++) {
                ret +=i;
        }
        return ret;
}

int main(int argc, char ** ch) {
        int loop;
        for(loop=0; loop<5; loop++){
                sleep(1);
                calculate();
        }
}

我使用 perf record 进行分析。

perf record -e sched:sched_stat_sleep -e sched:sched_switch -o perfnoallcpu.data -g ./loop

我看到每个开关都有 4 个 sched:sched_stat_sleep 报告。

 swapper     0/0     [000]   222.777113:    1269841 sched:sched_stat_sleep: comm=perf pid=2927 delay=1269841 [ns]
         swapper     0/0     [000]   222.777116:    1269841 sched:sched_stat_sleep: comm=perf pid=2927 delay=1269841 [ns]
         swapper     0/0     [000]   222.777117:    1269841 sched:sched_stat_sleep: comm=perf pid=2927 delay=1269841 [ns]
         swapper     0/0     [000]   222.777118:    1269841 sched:sched_stat_sleep: comm=perf pid=2927 delay=1269841 [ns]
            loop  2927/2927  [001]   222.777651:          1 sched:sched_switch: prev_comm=loop prev_pid=2927 prev_prio=120 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120
         swapper     0/0     [001]   223.777227:  999577599 sched:sched_stat_sleep: comm=loop pid=2927 delay=999577599 [ns]
         swapper     0/0     [001]   223.777230:  999577599 sched:sched_stat_sleep: comm=loop pid=2927 delay=999577599 [ns]
         swapper     0/0     [001]   223.777231:  999577599 sched:sched_stat_sleep: comm=loop pid=2927 delay=999577599 [ns]
         swapper     0/0     [001]   223.777232:  999577599 sched:sched_stat_sleep: comm=loop pid=2927 delay=999577599 [ns]
            loop  2927/2927  [001]   223.803313:          1 sched:sched_switch: prev_comm=loop prev_pid=2927 prev_prio=120 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120
         swapper     0/0     [001]   224.802882:  999568643 sched:sched_stat_sleep: comm=loop pid=2927 delay=999568643 [ns]
         swapper     0/0     [001]   224.802884:  999568643 sched:sched_stat_sleep: comm=loop pid=2927 delay=999568643 [ns]
         swapper     0/0     [001]   224.802885:  999568643 sched:sched_stat_sleep: comm=loop pid=2927 delay=999568643 [ns]
         swapper     0/0     [001]   224.802885:  999568643 sched:sched_stat_sleep: comm=loop pid=2927 delay=999568643 [ns]
            loop  2927/2927  [001]   224.814909:  999483089 sched:sched_stat_sleep: comm=kworker/1:1 pid=65 delay=999483089 [ns]
            loop  2927/2927  [001]   224.814913:          1 sched:sched_switch: prev_comm=loop prev_pid=2927 prev_prio=120 prev_state=R ==> next_comm=kworker/1:1 next_pid=65 next_prio=120
            loop  2927/2927  [001]   224.828492:          1 sched:sched_switch: prev_comm=loop prev_pid=2927 prev_prio=120 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120
         swapper     0/0     [001]   225.828066:  999572687 sched:sched_stat_sleep: comm=loop pid=2927 delay=999572687 [ns]
         swapper     0/0     [001]   225.828069:  999572687 sched:sched_stat_sleep: comm=loop pid=2927 delay=999572687 [ns]
         swapper     0/0     [001]   225.828070:  999572687 sched:sched_stat_sleep: comm=loop pid=2927 delay=999572687 [ns]
         swapper     0/0     [001]   225.828071:  999572687 sched:sched_stat_sleep: comm=loop pid=2927 delay=999572687 [ns]
            loop  2927/2927  [001]   225.829873:  301152896 sched:sched_stat_sleep: comm=compiz pid=1723 delay=301152896 [ns]
            loop  2927/2927  [001]   225.829877:          1 sched:sched_switch: prev_comm=loop prev_pid=2927 prev_prio=120 prev_state=R ==> next_comm=compiz next_pid=1723 next_prio=120
            loop  2927/2927  [001]   225.853764:          1 sched:sched_switch: prev_comm=loop prev_pid=2927 prev_prio=120 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120
         swapper     0/0     [001]   226.853341:  999576357 sched:sched_stat_sleep: comm=loop pid=2927 delay=999576357 [ns]
         swapper     0/0     [001]   226.853343:  999576357 sched:sched_stat_sleep: comm=loop pid=2927 delay=999576357 [ns]
         swapper     0/0     [001]   226.853344:  999576357 sched:sched_stat_sleep: comm=loop pid=2927 delay=999576357 [ns]
         swapper     0/0     [001]   226.853345:  999576357 sched:sched_stat_sleep: comm=loop pid=2927 delay=999576357 [ns]
            loop  2927/2927  [001]   226.878867:          1 sched:sched_switch: prev_comm=loop prev_pid=2927 prev_prio=120 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120
         swapper     0/0     [001]   227.878442:  999573760 sched:sched_stat_sleep: comm=loop pid=2927 delay=999573760 [ns]
         swapper     0/0     [001]   227.878443:  999573760 sched:sched_stat_sleep: comm=loop pid=2927 delay=999573760 [ns]
         swapper     0/0     [001]   227.878444:  999573760 sched:sched_stat_sleep: comm=loop pid=2927 delay=999573760 [ns]
         swapper     0/0     [001]   227.878445:  999573760 sched:sched_stat_sleep: comm=loop pid=2927 delay=999573760 [ns]

sched_stat_sleep 的数量随着 cpu 数量的增加而增加。这使应用程序看起来睡了更多时间。

如果我在执行 perf 记录时指定 -a 选项,问题就消失了,我们每次睡眠只看到一个 sched_stat_sleep 事件。

这是一个 bug in the kernel, which affects also other events (the message from chengjian (D) shows that it affects sched:sched_stat_sleep). Apparently, the event is being counted as many times as there are CPUs (nrcpus). According to this 差异,该修复包含在内核版本 4.18.17 中(这确保该事件仅针对它发生的 CPU 计算。