使用 OpenMP 的 OProfile
OProfile with OpenMP
我通过执行以下操作将 OProfile 用于 OpenMP 并行化代码,
$ gcc -I/usr/include/hdf5/serial/ -std=c11 -O3 -fopt-info -fopenmp sp_linsvm.c -o sp_linsvm -lhdf5_serial
$ sudo ocount --events=CPU_CLK_UNHALTED,LLC_MISSES,LLC_REFS,MEM_INST_RETIRED,BR_MISP_EXEC, ./sp_linsvm
Events were actively counted for 22.0 seconds.
Event counts (scaled) for /home/aidan/progs/linsvm/sp_linsvm:
Event Count % time counted
BR_MISP_EXEC 6,523,181 80.00
CPU_CLK_UNHALTED 225,384,009,348 80.00
LLC_MISSES 276,587,407 80.02
LLC_REFS 1,098,236,806 80.00
MEM_INST_RETIRED 51,754,855,734 79.99
我如何知道事件是按 CPU 还是整体计算的?我很确定它作为一个整体,因为如果我在没有 OpenMP 的情况下编译,它们接近数字,但我想确定。
ocount ... ./program
的默认模式是 "command"。据我了解,如果没有 -t
(--separate-thread
) 或 -c
(--separate-cpu
) 选项,来自所有线程的数据将被聚合。
所以,检查文档 http://oprofile.sourceforge.net/doc/controlling-counter.html#controlling-ocount
并尝试 -t
/ -c
选项...
--separate-thread
/ -t
This option can be used in conjunction with either the --process-list or --thread-list option to display event counts on a per-thread (per-process) basis. Without this option, all counts are aggregated.
--separate-cpu
/ -c
This option can be used in conjunction with either the --system-wide or --cpu-list option to display event counts on a per-cpu basis. Without this option, all counts are aggregated.
我通过执行以下操作将 OProfile 用于 OpenMP 并行化代码,
$ gcc -I/usr/include/hdf5/serial/ -std=c11 -O3 -fopt-info -fopenmp sp_linsvm.c -o sp_linsvm -lhdf5_serial
$ sudo ocount --events=CPU_CLK_UNHALTED,LLC_MISSES,LLC_REFS,MEM_INST_RETIRED,BR_MISP_EXEC, ./sp_linsvm
Events were actively counted for 22.0 seconds.
Event counts (scaled) for /home/aidan/progs/linsvm/sp_linsvm:
Event Count % time counted
BR_MISP_EXEC 6,523,181 80.00
CPU_CLK_UNHALTED 225,384,009,348 80.00
LLC_MISSES 276,587,407 80.02
LLC_REFS 1,098,236,806 80.00
MEM_INST_RETIRED 51,754,855,734 79.99
我如何知道事件是按 CPU 还是整体计算的?我很确定它作为一个整体,因为如果我在没有 OpenMP 的情况下编译,它们接近数字,但我想确定。
ocount ... ./program
的默认模式是 "command"。据我了解,如果没有 -t
(--separate-thread
) 或 -c
(--separate-cpu
) 选项,来自所有线程的数据将被聚合。
所以,检查文档 http://oprofile.sourceforge.net/doc/controlling-counter.html#controlling-ocount
并尝试 -t
/ -c
选项...
--separate-thread
/-t
This option can be used in conjunction with either the --process-list or --thread-list option to display event counts on a per-thread (per-process) basis. Without this option, all counts are aggregated.
--separate-cpu
/-c
This option can be used in conjunction with either the --system-wide or --cpu-list option to display event counts on a per-cpu basis. Without this option, all counts are aggregated.