如何观察可执行文件的一部分的 CUDA 事件和指标（例如，仅在内核执行期间）？

Question

我熟悉使用 nvprof 访问基准测试的事件和指标，例如

nvprof --system-profiling on --print-gpu-trace -o (file name) --events inst_issued1 ./benchmarkname

system-profiling on --print-gpu-trace -o (filename)

命令给出开始时间、内核结束时间、功率、温度的时间戳，并将信息保存在 nvvp 文件中，以便我们可以在可视化分析器中查看它。这使我们能够看到代码的任何部分发生了什么，特别是当特定内核运行ning 时。我的问题是——

有没有办法隔离只为基准测试运行的一部分计算的事件，例如在内核执行期间？在上面的命令中，

--events inst_issued1

只是给出了整个可执行文件的指令。谢谢！

Answer 1

您可能想阅读 profiler documentation。

您可以在可执行文件中打开和关闭分析。 cuda 运行时 API 是：

cudaProfilerStart() 
cudaProfilerStop()

因此，如果您只想收集特定内核的配置文件信息，您可以这样做：

#include <cuda_profiler_api.h>
...

cudaProfilerStart();
myKernel<<<...>>>(...);
cudaProfilerStop();

并摘自文档：

When using the start and stop functions, you also need to instruct the profiling tool to disable profiling at the start of the application. For nvprof you do this with the --profile-from-start off flag. For the Visual Profiler you use the Start execution with profiling enabled checkbox in the Settings View.

同样来自 the documentation for nvprof 具体而言，您可以使用命令行开关将 event/metric 制表限制为单个内核：

 --kernels <kernel name>

文档提供了更多的使用可能性。

Answer 2

进一步研究后发现，内核级信息也通过使用

为所有内核提供（w/o 使用 --kernels 并具体指定）

nvprof --events <event names> --metrics <metric names> ./<cuda benchmark>

事实上，它给出了

形式的输出

"Device","Kernel","Invocations","Event Name","Min","Max","Avg"

如果一个内核在基准测试中被多次调用，这允许您查看那些内核运行s 所需事件的最小值、最大值、平均值。显然，Cuda 7.5 Profiler 上的 --kernels 选项允许指定每个内核的每个运行。

如何观察可执行文件的一部分的 CUDA 事件和指标（例如，仅在内核执行期间）？

How to observe CUDA events and metrics for a subsection of an executable (e.g. only during a kernel execution time)?

profiling

cuda

nvvp

nvprof