重用 cudaEvent 序列化多个流

Reusing cudaEvent to serialize multiple streams

假设我有一个结构:

typedef enum {ON_CPU,ON_GPU,ON_BOTH} memLocation;

typedef struct foo *foo;
struct foo {
  cudaEvent_t event;
  float       *deviceArray;
  float       *hostArray;
  memLocation arrayLocation;
};

一个函数:

void copyArrayFromCPUToGPUAsync(foo bar, cudaStream_t stream)
{
  cudaStreamWaitEvent(stream, bar->event);
  if (bar->arrayLocation == ON_CPU) {
    // ON_CPU means !ON_GPU and !ON_BOTH
    cudaMemcpyAsync(cudaMemcpyHostToDevice, stream);
    bar->arrayLocation = ON_BOTH;
  }
  cudaEventRecord(bar->event, stream);
}

void doWorkOnGPUAsync(foo bar, cudaStream_t stream)
{
  cudaStreamWaitEvent(stream, bar->event);
  // do async work
  cudaEventRecord(bar->event, stream);
}

以及以下场景(有一头狮子,,以及某处的衣橱):

// stream1, stream2, and stream3 have no prior work
// assume bar->arrayLocation = ON_GPU

doWorkOnGPUAsync(bar, stream1);
copyArrayFromCPUToGPUAsync(bar, stream2); // A no-op
doWorkOnGPUAsync(bar, stream3);

以上安全吗? IE。如果它本身不起作用,stream2 还会等待 stream1 完成它的“工作”吗?生成的记录 cudaEvent 是否会反映这一点,这样 stream3 将在 stream1 完成之前不会开始?

这应该是安全的。

由于在等待事件和记录另一个事件之间缺少其他工作,因此在任何地方(据我所知)都没有提到某种“事件取消”。在 cudaEventRecord() 调用中重复使用相同的事件对象并不重要,因为作为运行时 API 文档 say:

cudaEventRecord() can be called multiple times on the same event and will overwrite the previously captured state. Other APIs such as cudaStreamWaitEvent() use the most recently captured state at the time of the API call, and are not affected by later calls to cudaEventRecord().


补充说明:

  • 对于您明显的用例,您可能还想考虑使用托管内存而不是手动来回复制的可能性。
  • 您应该 check for the success 您的各种操作,而不仅仅是假设它们成功了。