Perf 将一些直接跳转指令报告为内存访问指令

Perf Reports Some Direct Jump Instructions as Memory Access Instructions

我使用以下 perf 命令对 evince 用户空间对 DRAM 的读取访问进行了采样:

perf record -d --call-graph dwarf -c 100 -e mem_load_uops_retired.l3_miss:uppp /opt/evince-3.28.4/bin/evince

可以看出,我使用了PEBS特性来提高采样的准确性。但是有一些非内存访问报告为内存访问。例如,这是 perf script:

报告的采样事件
evince 20589 16079.401401:        100 mem_load_uops_retired.l3_miss:uppp:     555555860750         5080022 N/A|SNP N/A|TLB N/A|LCK N/A
    555555579939 ev_history_can_go_back+0x19 (/opt/evince-3.28.4/bin/evince)
    5555555862ef ev_window_update_actions_sensitivity+0xa1f (/opt/evince-3.28.4/bin/evince)
    55555558ce4f ev_window_page_changed_cb+0xf (/opt/evince-3.28.4/bin/evince)
    7ffff574510c g_closure_invoke+0x19c (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
    7ffff575805d signal_emit_unlocked_R+0xf4d (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
    7ffff5760714 g_signal_emit_valist+0xa74 (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
    7ffff576112e g_signal_emit+0x8e (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
    7ffff7140d76 emit_value_changed+0xf6 (inlined)
    7ffff7140d76 adjustment_set_value+0xf6 (inlined)
    7ffff7140d76 gtk_adjustment_set_value_internal+0xf6 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
    7ffff574510c g_closure_invoke+0x19c (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
    7ffff5757de7 signal_emit_unlocked_R+0xcd7 (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
    7ffff575fc7f g_signal_emitv+0x27f (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
    7ffff7153519 gtk_binding_entry_activate+0x289 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
    7ffff71539ef binding_activate+0x5f (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
    7ffff7153b7f gtk_bindings_activate_list+0x17f (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
    7ffff7154cd8 gtk_bindings_activate_event+0x98 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
    7ffff7973959 ev_view_key_press_event+0x59 (/opt/evince-3.28.4/lib/libevview3.so.3.0.0)
    7ffff72698f6 _gtk_marshal_BOOLEAN__BOXEDv+0xa6 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
    7ffff574524f _g_closure_invoke_va+0xbf (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
    7ffff57603cc g_signal_emit_valist+0x72c (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
    7ffff576112e g_signal_emit+0x8e (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
    7ffff73b1533 gtk_widget_event_internal+0x163 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
    7ffff73d1f0a gtk_window_propagate_key_event+0xfa (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
    5555555894b1 ev_window_key_press_event+0x31 (/opt/evince-3.28.4/bin/evince)
    7ffff72698f6 _gtk_marshal_BOOLEAN__BOXEDv+0xa6 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
    7ffff5745345 _g_closure_invoke_va+0x1b5 (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
    7ffff57603cc g_signal_emit_valist+0x72c (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
    7ffff576112e g_signal_emit+0x8e (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
    7ffff73b1533 gtk_widget_event_internal+0x163 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
    7ffff726693e propagate_event+0x21e (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
    7ffff7268947 gtk_main_do_event+0x7f7 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
    7ffff6d79764 _gdk_event_emit+0x24 (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
    7ffff6da9f91 gdk_event_source_dispatch+0x21 (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
    7ffff546a416 g_main_dispatch+0x2e6 (inlined)
    7ffff546a416 g_main_context_dispatch+0x2e6 (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
    7ffff546a64f g_main_context_iterate+0x1ff (inlined)
    7ffff546a6db g_main_context_iteration+0x2b (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
    7ffff5a2be3c g_application_run+0x1fc (/usr/lib/x86_64-linux-gnu/libgio-2.0.so.0.5600.4)
    555555573707 main+0x447 (/opt/evince-3.28.4/bin/evince)
    7ffff4a91b96 __libc_start_main+0xe6 (/lib/x86_64-linux-gnu/libc-2.27.so)
    555555573899 _start+0x29 (/opt/evince-3.28.4/bin/evince)
ffffffffffffffff [unknown] ([unknown])

这意味着在 0x555555579939(位于 evince text 函数 ev_history_can_go_back() 的偏移量 0x19 处的部分)。此内存指令是以下代码片段中的最后一行:

0000000000025920 <ev_history_can_go_back>:
   25920:       53                      push   %rbx
   25921:       48 89 fb                mov    %rdi,%rbx
   25924:       e8 67 fa ff ff          callq  25390 <ev_history_get_type>
   25929:       48 85 db                test   %rbx,%rbx
   2592c:       74 42                   je     25970 <ev_history_can_go_back+0x50>
   2592e:       48 8b 13                mov    (%rbx),%rdx
   25931:       48 85 d2                test   %rdx,%rdx
   25934:       74 05                   je     2593b <ev_history_can_go_back+0x1b>
   25936:       48 39 02                cmp    %rax,(%rdx)
   25939:       74 0f                   je     2594a <ev_history_can_go_back+0x2a>

这是到 ev_history_can_go_back+0x2a 的跳转,显然,这不是对地址 0x555555860750 处的 [heap] 的访问。这个perf报错了吗?


更新

下面的回溯怎么样?

11159097179866 0xfb80 [0x1778]: PERF_RECORD_SAMPLE(IP, 0x4002): 7309/7309: 0x7ffff6d6c310 period: 10000 addr: 0x7ffff7034e50
... FP chain: nr:0
... user regs: mask 0xff0fff ABI 64-bit
.... AX    0x555555b8b4c0
.... BX    0x555555c48e10
.... CX    0x1
.... DX    0x7fffffffd988
.... SI    0x7fffffffd980
.... DI    0x555555b8b4c0
.... BP    0x258
.... SP    0x7fffffffd978
.... IP    0x7ffff6d6c310
.... FLAGS 0x20e
.... CS    0x33
.... SS    0x2b
.... R8    0x27c
.... R9    0x24
.... R10   0x2a2
.... R11   0x0
.... R12   0x258
.... R13   0x555555b8b4c0
.... R14   0x3000
.... R15   0x7ffff5747000
... ustack: size 5768, offset 0xd8
 . data_src: 0x5080022
 ... thread: evince:7309
 ...... dso: /usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30
evince  7309 11159.097179:      10000    mem_load_uops_retired.l3_miss:uppp:     7ffff7034e50         5080022 N/A|SNP N/A|TLB N/A|LCK N/A
        7ffff6d6c310 cairo_surface_get_device_scale@plt+0x0 (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
        7ffff6d91029 gdk_window_create_similar_surface+0xc9 (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
        7ffff6d95410 gdk_window_begin_paint_internal+0x350 (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
        7ffff6d956f1 gdk_window_begin_draw_frame+0xc1 (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
        7ffff73c4942 gtk_widget_render+0xd2 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
        7ffff7268858 gtk_main_do_event+0x708 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
        7ffff6d79764 _gdk_event_emit+0x24 (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
        7ffff6d897f4 _gdk_window_process_updates_recurse_helper+0x104 (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
        7ffff6d8a9f5 gdk_window_process_updates_internal+0x165 (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
        7ffff6d8abef gdk_window_process_updates_with_mode+0x11f (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
        7ffff574510c g_closure_invoke+0x19c (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
        7ffff575805d signal_emit_unlocked_R+0xf4d (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
        7ffff5760714 g_signal_emit_valist+0xa74 (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
        7ffff576112e g_signal_emit+0x8e (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
        7ffff6d82ac8 gdk_frame_clock_paint_idle+0x3c8 (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
        7ffff6d6e07f gdk_threads_dispatch+0x1f (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
        7ffff546ad02 g_timeout_dispatch+0x12 (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
        7ffff546a284 g_main_dispatch+0x154 (inlined)
        7ffff546a284 g_main_context_dispatch+0x154 (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
        7ffff546a64f g_main_context_iterate+0x1ff (inlined)
        7ffff546a6db g_main_context_iteration+0x2b (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
        7ffff5a2be3c g_application_run+0x1fc (/usr/lib/x86_64-linux-gnu/libgio-2.0.so.0.5600.4)
        555555573707 main+0x447 (/opt/evince-3.28.4/bin/evince)
        7ffff4a91b96 __libc_start_main+0xe6 (/lib/x86_64-linux-gnu/libc-2.27.so)
        555555573899 _start+0x29 (/opt/evince-3.28.4/bin/evince)

访问点位于以下反汇编的偏移量0处:

Dump of assembler code for function cairo_surface_get_device_scale@plt:
   0x000000000002a310 <+0>:     jmpq   *0x2c8b3a(%rip)        # 0x2f2e50
   0x000000000002a316 <+6>:     pushq  [=14=]x1c7
   0x000000000002a31b <+11>:    jmpq   0x28690

这是一个无条件跳跃,不会导致宏观融合

至少在 Intel CPU 上,cmp %rax,(%rdx) 可以与以下 je 进行宏融合,同时还可以对负载进行微融合。 https://agner.org/optimize/. Also related: Micro fusion and addressing modes(这是一种非索引寻址模式,因此即使在 Sandybridge/IvyBridge 上也可以保持微融合)。

所以在融合域(退休发生的地方)你确实有一个内存源的单uop比较和分支。注意mem_load_uops_retired.l3_miss:uppp计数哎呀,不是指令。

即使在未融合域中,macro-fused compare/branch 确实在单个执行单元上作为单个 uop 执行,但负载确实必须在一个执行单元上执行单独的端口。 (Micro-fusion 节省 decode/issue 前端带宽和 uop 缓存 space,但不节省后端端口。)