发布模式和调试模式之间的 CUDA 运行时差异

CUDA Runtime difference between release mode and debug mode

我是运行 Visual Studio 2013 年。我是运行 CUDA 7.0.28

我可以通过选中或取消选中 CUDA 选项来切换运行时差异：

Generate GPU debug Information.

我有设备内核运行 a <<<1,1>>> 并且即使在那时也会出现错误。

我的问题是：

为什么在发布和调试模式下会给我不同的结果？
我应该寻找什么样的东西来尝试找出发生这种情况的原因。
有没有办法在内核函数中打断点？看起来并非如此。除了制作 printf 语句外，我还可以使用其他什么方法来追查问题？

谢谢。

Why would it give me different results in the release and debug mode?

在幕后，从 CUDA C/C++ 源代码生成的机器代码在调试模式下看起来会非常不同。差异列表太长，无法在此处涵盖，但大多数情况下，在调试模式下关闭所有编译器优化。这可能会导致竞争条件，例如，这在调试中很明显但在发布中不明显，反之亦然。

What kind of things should i be looking for to try and track down why this is occurring.

我会从最简单的工具开始。首先单独使用 cuda-memcheck 确认内核是运行而不会产生基本错误。如果 cuda-memcheck 报告您的内核出现故障，请按照方法 here to isolate the failure to a single line of source code. After fixing any errors reported in this fashion by cuda-memcheck, use the cuda-memcheck subtool options 进行操作，包括 racecheck、synccheck 和 initcheck，以查看是否存在任何这些捕获问题。

Is there a way to break point within the kernel function?

是的，windows 和 linux 上都有可用的调试器。在 windows 上，调试器集成到 Visual Studio 中。有 documentation available、演练，甚至还有演示如何执行各种操作（例如设置断点）的 youtube 视频。但是，在使用 cuda-memcheck 之前，我不会走这条路。

发布模式和调试模式之间的 CUDA 运行时差异

CUDA Runtime difference between release mode and debug mode

cuda

nvcc

visual-studio-2013