测量 CUDA 内核运行时间时是否需要预热代码？

Question

在第 85 页，professional CUDA C programming：

int main()
{
    ......
    // run a warmup kernel to remove overhead
    size_t iStart,iElaps;
    cudaDeviceSynchronize();
    iStart = seconds();
    warmingup<<<grid, block>>> (d_C);
    cudaDeviceSynchronize();
    iElaps = seconds() - iStart;
    printf("warmup <<< %4d %4d >>> elapsed %d sec \n",grid.x,block.x, iElaps );

    // run kernel 1
    iStart = seconds();
    mathKernel1<<<grid, block>>>(d_C);
    cudaDeviceSynchronize();
    iElaps = seconds() - iStart;
    printf("mathKernel1 <<< %4d %4d >>> elapsed %d sec \n",grid.x,block.x,iElaps );

    // run kernel 3
    iStart = seconds();
    mathKernel2<<<grid, block>>>(d_C);
    cudaDeviceSynchronize();
    iElaps = seconds () - iStart;
    printf("mathKernel2 <<< %4d %4d >>> elapsed %d sec \n",grid.x,block.x,iElaps );

    // run kernel 3
    iStart = seconds ();
    mathKernel3<<<grid, block>>>(d_C);
    cudaDeviceSynchronize();
    iElaps = seconds () - iStart;
    printf("mathKernel3 <<< %4d %4d >>> elapsed %d sec \n",grid.x,block.x,iElaps);
    ......
}

我们可以看到在测量不同内核的运行ning 时间之前有一个预热。

从GPU cards warming up?，我知道原因是：

If they are non-display cards, it might well be the driver shutting itself down after a period of inactivity. So what you are seeing on the first run might well be initialization overhead that only happens once.

所以如果我的 GPU 卡没有长时间闲置，例如，我只是用它来运行一些程序，它应该不需要运行任何预热代码。我的理解对吗？

Answer 1

除了 GPU 处于省电状态之外，还有许多其他原因导致内核的首次启动可能比进一步运行s 慢：

即时编译
将内核传输到 GPU 内存
缓存内容
...

出于这些原因，如果您对连续内核启动所达到的持续速度感兴趣，那么在定时内核运行之前至少执行一个 "warmup run" 始终是一个好习惯。

但是，如果您有特定的应用程序和用例，那么在相关情况下对该应用程序进行基准测试总是有意义的。准备好在较少控制的测量中运行时间发生更大的变化。

测量 CUDA 内核运行时间时是否需要预热代码？

Is the warmup code necessary when measuring CUDA kernel running time?

cuda

gpu

测量 CUDA 内核 运行 时间时是否需要预热代码？

Is the warmup code necessary when measuring CUDA kernel running time?

cuda

gpu

测量 CUDA 内核运行时间时是否需要预热代码？