硬件缓存事件和性能
Hardware cache events and perf
当我运行perf list
看到一堆Hardware Cache Events,如下:
$ perf list | grep 'cache event'
L1-dcache-load-misses [Hardware cache event]
L1-dcache-loads [Hardware cache event]
L1-dcache-stores [Hardware cache event]
L1-icache-load-misses [Hardware cache event]
LLC-load-misses [Hardware cache event]
LLC-loads [Hardware cache event]
LLC-store-misses [Hardware cache event]
LLC-stores [Hardware cache event]
branch-load-misses [Hardware cache event]
branch-loads [Hardware cache event]
dTLB-load-misses [Hardware cache event]
dTLB-loads [Hardware cache event]
dTLB-store-misses [Hardware cache event]
dTLB-stores [Hardware cache event]
iTLB-load-misses [Hardware cache event]
iTLB-loads [Hardware cache event]
node-load-misses [Hardware cache event]
node-loads [Hardware cache event]
node-store-misses [Hardware cache event]
node-stores [Hardware cache event]
根据测试,这些事件大多数看起来 return 合理值,但我想知道如何确定将这些事件映射到我系统上的硬件性能计数器事件?
也就是说,这些事件肯定是使用我的 Skylake 上的一个或多个底层 x86 PMU 计数器实现的 CPU - 但我怎么知道是哪些?
您可以在 /sys/devices/cpu/events
中查找其他硬件事件,但不能查找 "Hardware cache events"。
用户@Margaret 指出了一个合理的答案 - 阅读内核源代码以查看 PMU 事件的映射。
我们可以检查 arch/x86/events/intel/core.c 事件定义。我实际上不知道这里的 "core" 是否指的是核心架构,只是这是大多数定义的核心架构 - 但无论如何它是您想要查看的文件。
关键部分是this section,它定义了skl_hw_cache_event_ids
:
static __initconst const u64 skl_hw_cache_event_ids
[PERF_COUNT_HW_CACHE_MAX]
[PERF_COUNT_HW_CACHE_OP_MAX]
[PERF_COUNT_HW_CACHE_RESULT_MAX] =
{
[ C(L1D ) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0x81d0, /* MEM_INST_RETIRED.ALL_LOADS */
[ C(RESULT_MISS) ] = 0x151, /* L1D.REPLACEMENT */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = 0x82d0, /* MEM_INST_RETIRED.ALL_STORES */
[ C(RESULT_MISS) ] = 0x0,
},
[ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = 0x0,
[ C(RESULT_MISS) ] = 0x0,
},
},
...
解码嵌套初始化器,你得到 L1D-dcahe-load
对应于 MEM_INST_RETIRED.ALL_LOAD
和 L1-dcache-load-misses
对应于 L1D.REPLACEMENT
.
我们可以用 perf 仔细检查一下:
$ ocperf stat -e mem_inst_retired.all_loads,L1-dcache-loads,l1d.replacement,L1-dcache-load-misses,L1-dcache-loads,mem_load_retired.l1_hit head -c100M /dev/zero > /dev/null
Performance counter stats for 'head -c100M /dev/zero':
11,587,793 mem_inst_retired_all_loads
11,587,793 L1-dcache-loads
20,233 l1d_replacement
20,233 L1-dcache-load-misses # 0.17% of all L1-dcache hits
11,587,793 L1-dcache-loads
11,495,053 mem_load_retired_l1_hit
0.024322360 seconds time elapsed
"Hardware Cache" 事件显示的值与我们通过检查源猜测的使用底层 PMU 事件的值完全相同。
当我运行perf list
看到一堆Hardware Cache Events,如下:
$ perf list | grep 'cache event'
L1-dcache-load-misses [Hardware cache event]
L1-dcache-loads [Hardware cache event]
L1-dcache-stores [Hardware cache event]
L1-icache-load-misses [Hardware cache event]
LLC-load-misses [Hardware cache event]
LLC-loads [Hardware cache event]
LLC-store-misses [Hardware cache event]
LLC-stores [Hardware cache event]
branch-load-misses [Hardware cache event]
branch-loads [Hardware cache event]
dTLB-load-misses [Hardware cache event]
dTLB-loads [Hardware cache event]
dTLB-store-misses [Hardware cache event]
dTLB-stores [Hardware cache event]
iTLB-load-misses [Hardware cache event]
iTLB-loads [Hardware cache event]
node-load-misses [Hardware cache event]
node-loads [Hardware cache event]
node-store-misses [Hardware cache event]
node-stores [Hardware cache event]
根据测试,这些事件大多数看起来 return 合理值,但我想知道如何确定将这些事件映射到我系统上的硬件性能计数器事件?
也就是说,这些事件肯定是使用我的 Skylake 上的一个或多个底层 x86 PMU 计数器实现的 CPU - 但我怎么知道是哪些?
您可以在 /sys/devices/cpu/events
中查找其他硬件事件,但不能查找 "Hardware cache events"。
用户@Margaret 指出了一个合理的答案
我们可以检查 arch/x86/events/intel/core.c 事件定义。我实际上不知道这里的 "core" 是否指的是核心架构,只是这是大多数定义的核心架构 - 但无论如何它是您想要查看的文件。
关键部分是this section,它定义了skl_hw_cache_event_ids
:
static __initconst const u64 skl_hw_cache_event_ids
[PERF_COUNT_HW_CACHE_MAX]
[PERF_COUNT_HW_CACHE_OP_MAX]
[PERF_COUNT_HW_CACHE_RESULT_MAX] =
{
[ C(L1D ) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0x81d0, /* MEM_INST_RETIRED.ALL_LOADS */
[ C(RESULT_MISS) ] = 0x151, /* L1D.REPLACEMENT */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = 0x82d0, /* MEM_INST_RETIRED.ALL_STORES */
[ C(RESULT_MISS) ] = 0x0,
},
[ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = 0x0,
[ C(RESULT_MISS) ] = 0x0,
},
},
...
解码嵌套初始化器,你得到 L1D-dcahe-load
对应于 MEM_INST_RETIRED.ALL_LOAD
和 L1-dcache-load-misses
对应于 L1D.REPLACEMENT
.
我们可以用 perf 仔细检查一下:
$ ocperf stat -e mem_inst_retired.all_loads,L1-dcache-loads,l1d.replacement,L1-dcache-load-misses,L1-dcache-loads,mem_load_retired.l1_hit head -c100M /dev/zero > /dev/null
Performance counter stats for 'head -c100M /dev/zero':
11,587,793 mem_inst_retired_all_loads
11,587,793 L1-dcache-loads
20,233 l1d_replacement
20,233 L1-dcache-load-misses # 0.17% of all L1-dcache hits
11,587,793 L1-dcache-loads
11,495,053 mem_load_retired_l1_hit
0.024322360 seconds time elapsed
"Hardware Cache" 事件显示的值与我们通过检查源猜测的使用底层 PMU 事件的值完全相同。