用于多线程环境的 PMU

Question

我计划测量 L1、L2、L3 未命中分支预测未命中的 PMU 计数器，我已阅读相关的英特尔文档，但我不确定以下内容 scenarios.could 请有人澄清一下？

//assume PMU reset and PERFEVTSELx configurtion done above 
ioctl(fd, IOCTL_MSR_CMDS, (long long)msr_start)  //PMU start counters
my_program();
ioctl(fd, IOCTL_MSR_CMDS, (long long)msr_stop)   ///PMU stop
//now reading PMU counters

1.what 如果我的进程在 my_program() 为运行时被调度到另一个核心，会发生吗？

2.what 如果进程调度出去并再次调度回同一个核心，同时一些其他进程重置 PMU 计数器，会发生吗？

如何确保我们从 PMU 计数器读取正确的值。？

Machine details:CentOS with Linux kernel 3.10.0-327.22.2.el7.x86_64 , which is powered up with Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz

谢谢

Answer 1

我从一些英特尔论坛得到了答案，link在下面。

https://software.intel.com/en-us/forums/intel-moderncode-for-parallel-architectures/topic/673602

Answer 2

OP 开始的 the Intel forum thread 摘要：

Linux perf 子系统虚拟化了性能计数器，但这意味着您必须通过系统调用而不是 rdpmc 来读取它们才能获得完整虚拟化的 64 位值，而不是架构性能计数器寄存器中当前的任何值。
如果您想在自己的代码中使用 rdpmc 以便它可以自我测量，请将每个线程固定到一个核心，因为上下文切换不会 save/restore PMC。没有简单的方法可以避免测量核心上发生的一切，包括中断处理程序和其他获得时间片的进程。这可能是一件好事，因为您需要考虑内核开销的影响。

John D. McCalpin 博士（“带宽博士”）的更多有用引述：

For inline code instrumentation you should be able to use the "perf events" API, but the documentation is minimal. Some resources are available at http://web.eece.maine.edu/~vweaver/projects/perf_events/faq.html

You can use "pread()" on the /dev/cpu/*/msr device files to read the MSRs -- this may be a bit easier to read than IOCTL-based code. The codes "rdmsr.c" and "wrmsr.c" from "msr-tools-1.3" provide excellent examples.

There have been a number of approaches to reserving and sharing performance counters, including both software-only and combined hardware+software approaches, but at this point there is not a "standard" approach. (It looks like Intel has a hardware-based approach using MSR 0x392 IA32_PERF_GLOBAL_INUSE, but I don't know what platforms support it.)

你的问题

what will happen if my process is scheduled out when my_program() is running, and scheduled to another core?

您会看到随机垃圾，如果另一个进程在您进程的时间片之间重置 PMC，情况也是如此。

用于多线程环境的 PMU

PMU for multi threaded environment

c

linux

multithreading

intel