"cpu/mem-loads/pp" 和 "cpu/mem-loads/" 有什么区别?
What is the difference between "cpu/mem-loads/pp" and "cpu/mem-loads/"?
我阅读了perf list
的手册,找到了memory load/store
的以下PMU
事件定义:
mem-loads OR cpu/mem-loads/ [Kernel PMU event]
mem-stores OR cpu/mem-stores/ [Kernel PMU event]
但我总是阅读 perf
脚本,它使用“cpu/mem-loads/pp
”而不是“cpu/mem-loads/
”。它们之间有什么区别?它们是一样的吗?我试过 google 答案,但找不到解释。
修饰符p
代表precise level
在做采样的时候,它用来表示你可以容忍的skid:报告的指令距离生成样本的有效指令有多远。 pp
表示要求 SAMPLE_IP 有 0 个打滑。换句话说,当你做内存访问采样时,你想知道到底是哪条指令产生了访问。
参见man perf list
:
p - precise level
....
The p modifier can be used for specifying how precise the instruction address should be. The p modifier can be specified multiple times:
0 - SAMPLE_IP can have arbitrary skid
1 - SAMPLE_IP must have constant skid
2 - SAMPLE_IP requested to have 0 skid
3 - SAMPLE_IP must have 0 skid
For Intel systems precise event sampling is implemented with PEBS which supports up to precise-level 2.
On AMD systems it is implemented using IBS (up to precise-level 2). The precise modifier works with event types 0x76 (cpu-cycles, CPU clocks not halted) and 0xC1 (micro-ops
retired). Both events map to IBS execution sampling (IBS op) with the IBS Op Counter Control bit (IbsOpCntCtl) set respectively (see AMD64 Architecture Programmer’s Manual Volume
2: System Programming, 13.3 Instruction-Based Sampling). Examples to use IBS:
perf record -a -e cpu-cycles:p ... # use ibs op counting cycles
perf record -a -e r076:p ... # same as -e cpu-cycles:p
perf record -a -e r0C1:p ... # use ibs op counting micro-ops
我阅读了perf list
的手册,找到了memory load/store
的以下PMU
事件定义:
mem-loads OR cpu/mem-loads/ [Kernel PMU event]
mem-stores OR cpu/mem-stores/ [Kernel PMU event]
但我总是阅读 perf
脚本,它使用“cpu/mem-loads/pp
”而不是“cpu/mem-loads/
”。它们之间有什么区别?它们是一样的吗?我试过 google 答案,但找不到解释。
修饰符p
代表precise level
在做采样的时候,它用来表示你可以容忍的skid:报告的指令距离生成样本的有效指令有多远。 pp
表示要求 SAMPLE_IP 有 0 个打滑。换句话说,当你做内存访问采样时,你想知道到底是哪条指令产生了访问。
参见man perf list
:
p - precise level
....
The p modifier can be used for specifying how precise the instruction address should be. The p modifier can be specified multiple times:
0 - SAMPLE_IP can have arbitrary skid
1 - SAMPLE_IP must have constant skid
2 - SAMPLE_IP requested to have 0 skid
3 - SAMPLE_IP must have 0 skid
For Intel systems precise event sampling is implemented with PEBS which supports up to precise-level 2.
On AMD systems it is implemented using IBS (up to precise-level 2). The precise modifier works with event types 0x76 (cpu-cycles, CPU clocks not halted) and 0xC1 (micro-ops
retired). Both events map to IBS execution sampling (IBS op) with the IBS Op Counter Control bit (IbsOpCntCtl) set respectively (see AMD64 Architecture Programmer’s Manual Volume
2: System Programming, 13.3 Instruction-Based Sampling). Examples to use IBS:
perf record -a -e cpu-cycles:p ... # use ibs op counting cycles
perf record -a -e r076:p ... # same as -e cpu-cycles:p
perf record -a -e r0C1:p ... # use ibs op counting micro-ops