在 C 中手动深度复制到设备

Manual Deep Copy to Device in C

我正在尝试并行化一个使用 openACC 进行图像处理的程序。作为此处理的一部分,我定义了一个类似于以下内容的自定义结构:

typedef struct {
  RGB *image;
  double property;
} Deep;

我正在访问数组 Deep *structPointer

我遇到了一些手动将 structPointer 的全部内容复制到 GPU 的文档,这给我留下了以下代码。

  Deep *structPointer = (Deep*)
    malloc(total_size*sizeof(Deep));
  assert(structPointer);

  int i;

  for (i = 0; i < total_size; i++)
  {
    structPointer[i].image = randomImage(width, height, max);
  }

    dP = acc_copyin( stuctPointer, sizeof( Deep )*total_size ); 

  for ( i=0; i < total_size; i++ ) {
   dA = acc_copyin( structPointer[i].image, sizeof(RGB)*width*height );     //device address in dA
   acc_memcpy_to_device( &dP[i].image, &dA,  sizeof(RGB*) );
  }

这一切 运行 都很好,直到我尝试 运行 访问 structPointer 并修改数组成员的 property 属性的并行 for 循环基于RGB *image的内容。

伪代码:

#pragma acc parallel loop copyin(inputImage[0:width*height], width, height)
for (i = 0; i < total_size; i++) {
  computeProperty(input_image, structPointer+i, width, height)
}

inline void compProperty (const RGB *A, Deep *B, int width, int height)
{
   B->property = 10;
}

我得到:

call to cuStreamSynchronize returned error 700: Illegal address during kernel execution

cuda-memcheck的输出是:

> ========= CUDA-MEMCHECK image2.ppm is a PPM file 256 x 256 image, max value= 255
> ========= Program hit CUDA_ERROR_INVALID_CONTEXT (error 201) due to "invalid device context" on CUDA API call to cuCtxAttach.
> =========     Saved host backtrace up to driver entry point at error
> =========     Host Frame:/usr/lib64/libcuda.so (cuCtxAttach + 0x156) [0x13fc36]
> =========     Host Frame:./genimg_acc [0x13639]
> =========
> ========= Program hit CUDA_ERROR_ILLEGAL_ADDRESS (error 700) due to "an illegal memory access was encountered" on CUDA API call to
> cuStreamSynchronize. call to cuStreamSynchronize returned error 700:
> Illegal address during kernel execution
> =========     Saved host backtrace up to driver entry point at error
> =========     Host Frame:/usr/lib64/libcuda.so (cuStreamSynchronize + 0x13d) [0x149a9d]
> =========     Host Frame:./genimg_acc [0x15856]
> =========
> ========= Program hit CUDA_ERROR_ILLEGAL_ADDRESS (error 700) due to "an illegal memory access was encountered" on CUDA API call to
> cuCtxSynchronize.
> =========     Saved host backtrace up to driver entry point at error
> =========     Host Frame:/usr/lib64/libcuda.so (cuCtxSynchronize + 0x127) [0x13ee37]

请注意,程序 运行 在没有使用 openACC 的情况下编译,并且当 运行 在单线程中时将正确处理。

好的,我找到了 OpenACC Deep Copying 的参考,这可能是您已经根据 Deep 名称查看的内容。查看第 7 页的 图 9,他们为您提供了一个在包含标量和指针的结构上执行深度复制的示例。

必须使用 acc_copyin 返回的指针来访问并行化代码中的结构数组——即 dP 而不是 structPointer。下面的代码应该可以解决这个问题。

#pragma acc parallel loop copyin(inputImage[0:width*height], width, height)
for (i = 0; i < total_size; i++) {
  computeProperty(input_image, dP+i, width, height)
}