用于多线程环境的 PMU
PMU for multi threaded environment
我计划测量 L1、L2、L3 未命中分支预测未命中的 PMU 计数器,我已阅读相关的英特尔文档,但我不确定以下内容 scenarios.could 请有人澄清一下?
//assume PMU reset and PERFEVTSELx configurtion done above
ioctl(fd, IOCTL_MSR_CMDS, (long long)msr_start) //PMU start counters
my_program();
ioctl(fd, IOCTL_MSR_CMDS, (long long)msr_stop) ///PMU stop
//now reading PMU counters
1.what 如果我的进程在 my_program() 为 运行 时被调度到另一个核心,会发生吗?
2.what 如果进程调度出去并再次调度回同一个核心,同时一些其他进程重置 PMU 计数器,会发生吗?
如何确保我们从 PMU 计数器读取正确的值。?
Machine details:CentOS with Linux kernel 3.10.0-327.22.2.el7.x86_64 , which is powered up with Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
谢谢
我从一些英特尔论坛得到了答案,link在下面。
https://software.intel.com/en-us/forums/intel-moderncode-for-parallel-architectures/topic/673602
OP 开始的 the Intel forum thread 摘要:
Linux perf
子系统虚拟化了性能计数器,但这意味着您必须通过系统调用而不是 rdpmc
来读取它们才能获得完整虚拟化的 64 位值,而不是架构性能计数器寄存器中当前的任何值。
如果您想在自己的代码中使用 rdpmc
以便它可以自我测量,请将每个线程固定到一个核心,因为上下文切换不会 save/restore PMC。没有简单的方法可以避免测量核心上发生的一切,包括中断处理程序和其他获得时间片的进程。这可能是一件好事,因为您需要考虑内核开销的影响。
John D. McCalpin 博士(“带宽博士”)的更多有用引述:
For inline code instrumentation you should be able to use the "perf events" API, but the documentation is minimal. Some resources are available at http://web.eece.maine.edu/~vweaver/projects/perf_events/faq.html
You can use "pread()" on the /dev/cpu/*/msr device files to read the
MSRs -- this may be a bit easier to read than IOCTL-based code. The
codes "rdmsr.c" and "wrmsr.c" from "msr-tools-1.3" provide excellent
examples.
There have been a number of approaches to reserving and sharing
performance counters, including both software-only and combined
hardware+software approaches, but at this point there is not a
"standard" approach. (It looks like Intel has a hardware-based
approach using MSR 0x392 IA32_PERF_GLOBAL_INUSE, but I don't know what
platforms support it.)
你的问题
what will happen if my process is scheduled out when my_program() is running, and scheduled to another core?
您会看到随机垃圾,如果另一个进程在您进程的时间片之间重置 PMC,情况也是如此。
我计划测量 L1、L2、L3 未命中分支预测未命中的 PMU 计数器,我已阅读相关的英特尔文档,但我不确定以下内容 scenarios.could 请有人澄清一下?
//assume PMU reset and PERFEVTSELx configurtion done above
ioctl(fd, IOCTL_MSR_CMDS, (long long)msr_start) //PMU start counters
my_program();
ioctl(fd, IOCTL_MSR_CMDS, (long long)msr_stop) ///PMU stop
//now reading PMU counters
1.what 如果我的进程在 my_program() 为 运行 时被调度到另一个核心,会发生吗?
2.what 如果进程调度出去并再次调度回同一个核心,同时一些其他进程重置 PMU 计数器,会发生吗?
如何确保我们从 PMU 计数器读取正确的值。?
Machine details:CentOS with Linux kernel 3.10.0-327.22.2.el7.x86_64 , which is powered up with Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
谢谢
我从一些英特尔论坛得到了答案,link在下面。
https://software.intel.com/en-us/forums/intel-moderncode-for-parallel-architectures/topic/673602
OP 开始的 the Intel forum thread 摘要:
Linux
perf
子系统虚拟化了性能计数器,但这意味着您必须通过系统调用而不是rdpmc
来读取它们才能获得完整虚拟化的 64 位值,而不是架构性能计数器寄存器中当前的任何值。如果您想在自己的代码中使用
rdpmc
以便它可以自我测量,请将每个线程固定到一个核心,因为上下文切换不会 save/restore PMC。没有简单的方法可以避免测量核心上发生的一切,包括中断处理程序和其他获得时间片的进程。这可能是一件好事,因为您需要考虑内核开销的影响。
John D. McCalpin 博士(“带宽博士”)的更多有用引述:
For inline code instrumentation you should be able to use the "perf events" API, but the documentation is minimal. Some resources are available at http://web.eece.maine.edu/~vweaver/projects/perf_events/faq.html
You can use "pread()" on the /dev/cpu/*/msr device files to read the MSRs -- this may be a bit easier to read than IOCTL-based code. The codes "rdmsr.c" and "wrmsr.c" from "msr-tools-1.3" provide excellent examples.
There have been a number of approaches to reserving and sharing performance counters, including both software-only and combined hardware+software approaches, but at this point there is not a "standard" approach. (It looks like Intel has a hardware-based approach using MSR 0x392 IA32_PERF_GLOBAL_INUSE, but I don't know what platforms support it.)
你的问题
what will happen if my process is scheduled out when my_program() is running, and scheduled to another core?
您会看到随机垃圾,如果另一个进程在您进程的时间片之间重置 PMC,情况也是如此。