如何正确测量opencl程序的运行时间

Question

我在使用opencl测试程序性能的时候，遇到了一些疑惑。我们实验的初衷是为了测试程序的计算时间，但是在实际过程中出现了一些问题。在下面的代码中，timex使用了绑定事件的方式来计算结果。然而，在实际结果中，我们发现 (end_time - start_time) < time1+time2+time3 。我们不知道为什么，我们也很好奇我们应该如何计算任务的运行时间。

start_time = clock()
compute(){
    writeData();  // time1
    clEnqueueNDRangeKernel(); // time2
    readData(); //time3
    do_other();
}
end_time = clock()

Answer 1

这取决于你到底想测量什么。如果只是为了分析，您可能会对 time1、time2 和 time3 分别感兴趣。如果是为了衡量compute()的性能，你要衡量它的整个运行时间。

要以纳秒精度测量时间，请使用此时钟：

#include <chrono>
class Clock {
private:
    typedef chrono::high_resolution_clock clock;
    chrono::time_point<clock> t;
public:
    Clock() { start(); }
    void start() { t = clock::now(); }
    double stop() const { return chrono::duration_cast<chrono::duration<double>>(clock::now()-t).count(); }
};

在您的示例中，(end_time-start_time) < time1+time2+time3 的问题可能有两个可能的原因：

您的时钟实现不够准确，只是舍入误差。
您没有使用 clFinish(queue);。 OpenCL 命令仅入队，但不会立即执行。所以 clEnqueueNDRangeKernel(); 例如立即 returns，如果你测量之前和之后的时间，你实际上得到 0。要等到内核真正执行完，你需要调用 clFinish afterards .

示例：

分别测量time1、time2和time3：

compute() {
    //clFinish(queue); // if there could still be stuff in the queue, finish it first
    Clock clock;
    clock.start();
    writeData(); // time1
    clFinish(queue);
    const double time1 = clock.stop(); // in seconds
    clock.start();
    clEnqueueNDRangeKernel(); // time2
    clFinish(queue);
    const double time2 = clock.stop(); // in seconds
    clock.start();
    readData(); //time3
    clFinish(queue);
    const double time3 = clock.stop(); // in seconds
    do_other();
    clFinish(queue); // don't forget to finish the queue!
}

要测量compute()的整个执行时间，最后只需要一个clFinish。

compute() {
    //clFinish(queue); // if there could still be stuff in the queue, finish it first
    Clock clock;
    clock.start();
    writeData(); // time1
    clEnqueueNDRangeKernel(); // time2
    readData(); //time3
    do_other();
    clFinish(queue); // don't forget to finish the queue!
    const double time = clock.stop(); // in seconds
}

如何正确测量opencl程序的运行时间

How to correctly measure the running time of an opencl program

opencl