性能统计中的公式
Formulas in perf stat
我想知道 perf stat
中用于从原始数据计算数字的公式。
perf stat -e task-clock,cycles,instructions,cache-references,cache-misses ./myapp
1080267.226401 task-clock (msec) # 19.062 CPUs utilized
1,592,123,216,789 cycles # 1.474 GHz (50.00%)
871,190,006,655 instructions # 0.55 insn per cycle (75.00%)
3,697,548,810 cache-references # 3.423 M/sec (75.00%)
459,457,321 cache-misses # 12.426 % of all cache refs (75.00%)
在这种情况下,您如何根据缓存引用计算 M/sec?
公式似乎没有在 builtin-stat.c
(where default event sets for perf stat
are defined), but they are probably calculated (and averaged with stddev) in perf_stat__print_shadow_stats()
(and some stats are collected into arrays in perf_stat__update_shadow_stats()
中实现):
http://elixir.free-electrons.com/linux/v4.13.4/source/tools/perf/util/stat-shadow.c#L626
当统计HW_INSTRUCTIONS时:
"Instructions per clock" = HW_INSTRUCTIONS / HW_CPU_CYCLES; "stalled cycles per instruction" = HW_STALLED_CYCLES_FRONTEND / HW_INSTRUCTIONS
if (perf_evsel__match(evsel, HARDWARE, HW_INSTRUCTIONS)) {
total = avg_stats(&runtime_cycles_stats[ctx][cpu]);
if (total) {
ratio = avg / total;
print_metric(ctxp, NULL, "%7.2f ",
"insn per cycle", ratio);
} else {
print_metric(ctxp, NULL, NULL, "insn per cycle", 0);
}
分支未命中来自 print_branch_misses
作为 HW_BRANCH_MISSES / HW_BRANCH_INSTRUCTIONS
perf_stat__print_shadow_stats()
里面有几个cache miss ratio的计算也很像HW_CACHE_MISSES / HW_CACHE_REFERENCES还有一些更详细(perf stat -d
模式)。
停滞百分比 are computed 作为 HW_STALLED_CYCLES_FRONTEND / HW_CPU_CYCLES 和 HW_STALLED_CYCLES_BACKEND / HW_CPU_CYCLES
GHz 计算为 HW_CPU_CYCLES / runtime_nsecs_stats,其中 runtime_nsecs_stats
是从任何软件事件 task-clock
或 cpu-clock
更新的(SW_TASK_CLOCK & SW_CPU_CLOCK, We still know no exact difference between them two 自 2010 年在 LKML 和 2014 年在 SO)
if (perf_evsel__match(counter, SOFTWARE, SW_TASK_CLOCK) ||
perf_evsel__match(counter, SOFTWARE, SW_CPU_CLOCK))
update_stats(&runtime_nsecs_stats[cpu], count[0]);
还有several formulas for transactions(perf stat -T
模式)。
"CPU utilized" is from task-clock
or cpu-clock
/ walltime_nsecs_stats, where walltime is calculated by the perf stat itself (in userspace 使用墙上的时钟(天文时间,):
static inline unsigned long long rdclock(void)
{
struct timespec ts;
clock_gettime(CLOCK_MONOTONIC, &ts);
return ts.tv_sec * 1000000000ULL + ts.tv_nsec;
}
...
static int __run_perf_stat(int argc, const char **argv)
{
...
/*
* Enable counters and exec the command:
*/
t0 = rdclock();
clock_gettime(CLOCK_MONOTONIC, &ref_time);
if (forks) {
....
}
t1 = rdclock();
update_stats(&walltime_nsecs_stats, t1 - t0);
还有some estimations from the Top-Down methodology (Tuning Applications Using a Top-down Microarchitecture Analysis Method, Software Optimizations Become Simple with Top-Down Analysis .. Name Skylake, IDF2015, #22 in Gregg's Methodology List. Described in 2016 by Andi Kleen https://lwn.net/Articles/688335/"Add top down metrics to perf stat"(perf stat --topdown -I 1000 cmd
模式)
最后,如果当前打印事件没有确切的公式,则有通用的“%c/sec”(K/sec 或 M/sec)度量:http://elixir.free-electrons.com/linux/v4.13.4/source/tools/perf/util/stat-shadow.c#L845 任何除以运行时纳秒(任务时钟或 cpu 时钟事件,如果它们存在于 perf stat
事件集中)
} else if (runtime_nsecs_stats[cpu].n != 0) {
char unit = 'M';
char unit_buf[10];
total = avg_stats(&runtime_nsecs_stats[cpu]);
if (total)
ratio = 1000.0 * avg / total;
if (ratio < 0.001) {
ratio *= 1000;
unit = 'K';
}
snprintf(unit_buf, sizeof(unit_buf), "%c/sec", unit);
print_metric(ctxp, NULL, "%8.3f", unit_buf, ratio);
}
我想知道 perf stat
中用于从原始数据计算数字的公式。
perf stat -e task-clock,cycles,instructions,cache-references,cache-misses ./myapp
1080267.226401 task-clock (msec) # 19.062 CPUs utilized
1,592,123,216,789 cycles # 1.474 GHz (50.00%)
871,190,006,655 instructions # 0.55 insn per cycle (75.00%)
3,697,548,810 cache-references # 3.423 M/sec (75.00%)
459,457,321 cache-misses # 12.426 % of all cache refs (75.00%)
在这种情况下,您如何根据缓存引用计算 M/sec?
公式似乎没有在 builtin-stat.c
(where default event sets for perf stat
are defined), but they are probably calculated (and averaged with stddev) in perf_stat__print_shadow_stats()
(and some stats are collected into arrays in perf_stat__update_shadow_stats()
中实现):
http://elixir.free-electrons.com/linux/v4.13.4/source/tools/perf/util/stat-shadow.c#L626
当统计HW_INSTRUCTIONS时: "Instructions per clock" = HW_INSTRUCTIONS / HW_CPU_CYCLES; "stalled cycles per instruction" = HW_STALLED_CYCLES_FRONTEND / HW_INSTRUCTIONS
if (perf_evsel__match(evsel, HARDWARE, HW_INSTRUCTIONS)) {
total = avg_stats(&runtime_cycles_stats[ctx][cpu]);
if (total) {
ratio = avg / total;
print_metric(ctxp, NULL, "%7.2f ",
"insn per cycle", ratio);
} else {
print_metric(ctxp, NULL, NULL, "insn per cycle", 0);
}
分支未命中来自 print_branch_misses
作为 HW_BRANCH_MISSES / HW_BRANCH_INSTRUCTIONS
perf_stat__print_shadow_stats()
里面有几个cache miss ratio的计算也很像HW_CACHE_MISSES / HW_CACHE_REFERENCES还有一些更详细(perf stat -d
模式)。
停滞百分比 are computed 作为 HW_STALLED_CYCLES_FRONTEND / HW_CPU_CYCLES 和 HW_STALLED_CYCLES_BACKEND / HW_CPU_CYCLES
GHz 计算为 HW_CPU_CYCLES / runtime_nsecs_stats,其中 runtime_nsecs_stats
是从任何软件事件 task-clock
或 cpu-clock
更新的(SW_TASK_CLOCK & SW_CPU_CLOCK, We still know no exact difference between them two 自 2010 年在 LKML 和 2014 年在 SO)
if (perf_evsel__match(counter, SOFTWARE, SW_TASK_CLOCK) ||
perf_evsel__match(counter, SOFTWARE, SW_CPU_CLOCK))
update_stats(&runtime_nsecs_stats[cpu], count[0]);
还有several formulas for transactions(perf stat -T
模式)。
"CPU utilized" is from task-clock
or cpu-clock
/ walltime_nsecs_stats, where walltime is calculated by the perf stat itself (in userspace 使用墙上的时钟(天文时间,):
static inline unsigned long long rdclock(void)
{
struct timespec ts;
clock_gettime(CLOCK_MONOTONIC, &ts);
return ts.tv_sec * 1000000000ULL + ts.tv_nsec;
}
...
static int __run_perf_stat(int argc, const char **argv)
{
...
/*
* Enable counters and exec the command:
*/
t0 = rdclock();
clock_gettime(CLOCK_MONOTONIC, &ref_time);
if (forks) {
....
}
t1 = rdclock();
update_stats(&walltime_nsecs_stats, t1 - t0);
还有some estimations from the Top-Down methodology (Tuning Applications Using a Top-down Microarchitecture Analysis Method, Software Optimizations Become Simple with Top-Down Analysis .. Name Skylake, IDF2015, #22 in Gregg's Methodology List. Described in 2016 by Andi Kleen https://lwn.net/Articles/688335/"Add top down metrics to perf stat"(perf stat --topdown -I 1000 cmd
模式)
最后,如果当前打印事件没有确切的公式,则有通用的“%c/sec”(K/sec 或 M/sec)度量:http://elixir.free-electrons.com/linux/v4.13.4/source/tools/perf/util/stat-shadow.c#L845 任何除以运行时纳秒(任务时钟或 cpu 时钟事件,如果它们存在于 perf stat
事件集中)
} else if (runtime_nsecs_stats[cpu].n != 0) {
char unit = 'M';
char unit_buf[10];
total = avg_stats(&runtime_nsecs_stats[cpu]);
if (total)
ratio = 1000.0 * avg / total;
if (ratio < 0.001) {
ratio *= 1000;
unit = 'K';
}
snprintf(unit_buf, sizeof(unit_buf), "%c/sec", unit);
print_metric(ctxp, NULL, "%8.3f", unit_buf, ratio);
}