代码性能严格测量

Question

我正在创建用于测量 CentOS 7 中单个消息处理时间的性能框架工具。我使用 isolcpus 内核选项为此任务保留了一个 CPU 并且我运行它使用 任务集 。

好的，现在是问题。我试图测量多条消息之间的最大处理时间。处理时间 <= 1000ns，但是当我运行多次迭代时，我得到非常高的结果（> 10000ns）。

我在这里创建了一些简单的代码，它没有做任何有趣的事情，但显示了问题。根据迭代次数，我可以获得如下结果：

max: 84 min: 23 -> for 1000 iterations
max: 68540 min: 11 -> for 100000000 iterations

我想了解这种差异从何而来？我尝试运行以最高优先级进行实时调度。有什么方法可以防止这种情况发生吗？

#include <iostream>
#include <limits>
#include <time.h>

const unsigned long long SEC = 1000L*1000L*1000L;

inline int64_t time_difference( const timespec &start,
                             const timespec &stop ) {
    return ( (stop.tv_sec * SEC - start.tv_sec * SEC) +
             (stop.tv_nsec - start.tv_nsec));
}
int main()
{
    timespec start, stop;
    int64_t max = 0, min = std::numeric_limits<int64_t>::max();

    for(int i = 0; i < 100000000; ++i){
        clock_gettime(CLOCK_REALTIME, &start);
        clock_gettime(CLOCK_REALTIME, &stop);
        int64_t time = time_difference(start, stop);
        max = std::max(max, time);
        min = std::min(min, time);
    }
    std::cout << "max: " << max << " min: " << min << std::endl;
}

Answer 1

让我们检查文档...

Isolation will be effected for userspace processes - kernel threads may still get scheduled on the isolcpus isolated CPUs.

因此似乎无法保证完美隔离，至少不能从内核中获得。

Answer 2

即使使用 isolcpus，您也无法真正将抖动降低到零，因为您至少还有以下条件：

1) 中断传递给您的 CPU（您可以减少我对 irq affinity 的干扰 - 但可能不会为零）。

2) 时钟计时器中断仍会为您的进程安排，并且可能会在内核端执行可变数量的工作。

3) CPU 本身可能会因 P-state 或 C-state 转换或其他原因而短暂暂停（例如，在打开 AVX 电路后让电压水平稳定等） .

代码性能严格测量

Code performance strick measurement

c++

linux

performance

clock