__device__ 变量是否有大小限制

Does __device__ variable have a size limit

我想为几个内核方法使用全局变量,但是当我使用流动代码到 init __device__ 变量时,我在初始化第二个变量时遇到了 [access violation on store (global memory)] 错误。

__device__ short* blockTmp[4];
//init blockTmp
template<int BS>
__global__ void InitQuarterBuf_kernel(
    )
{

    int iBufSize = 2000000;
    for (int i = 0; i < 4; i++){
        blockTmp[[i] = new short[iBufSize];
        blockTmp[[i][iBufSize-1]=1;
        printf("blockTmp[[%d][%d] is %d.\n",i,iBufSize-1,blockTmp[[i][iBufSize-1]);     
    }
}

错误信息:

Memory Checker detected 1 access violations.
error = access violation on store (global memory)
gridid = 94
blockIdx = {0,0,0}
threadIdx = {0,0,0}
address = 0x003d08fe
accessSize = 2

CUDA grid launch failed: CUcontext: 1014297073536 CUmodule: 1013915337344 Function: _Z21InitBuf_kernelILi8EEvii
CUDA context created : 67e557f3e0
CUDA module loaded:   67cdc7ed80 

CUDA module loaded:   67cdc7e180 
================================================================================
CUDA Memory Checker detected 1 threads caused an access violation:
Launch Parameters
    CUcontext    = 67e557f3e0
    CUstream     = 67cdc7f580
    CUmodule     = 67cdc7e180
    CUfunction   = 67eb64b2f0
    FunctionName = _Z21InitBuf_kernelILi8EEvii
    GridId       = 94
    gridDim      = {1,1,1}
    blockDim     = {1,1,1}
    sharedSize   = 256
    Parameters (raw):
         0x00000780 0x00000440
GPU State:
   Address  Size      Type  Mem       Block  Thread         blockIdx  threadIdx                                         PC  Source
----------------------------------------------------------------------------------------------------------------------------------
  003d08fe     2    adr st    g           0       0          {0,0,0}    {0,0,0}  _Z21InitBuf_kernelILi8EEvii+0004c8  


Summary of access violations:
xxxx_launcher.cu(481): error MemoryChecker: #misaligned=0  #invalidAddress=1
================================================================================

Memory Checker detected 1 access violations.
error = access violation on store (global memory)
gridid = 94
blockIdx = {0,0,0}
threadIdx = {0,0,0}
address = 0x003d08fe
accessSize = 2

CUDA grid launch failed: CUcontext: 446229378016 CUmodule: 445834060160 Function: _Z21InitBuf_kernelILi8EEvii

__device__ variable有限制吗?我怎样才能初始化 __device__ variable?

如果我将缓冲区大小更改为 1000,就可以了。

您的 posted 内核没有任何意义,因为您的 __device__ 变量被命名为 blockTmp 但您正在内核中初始化 m_filteredBlockTmp 变量,这似乎没有在任何地方定义。

无论如何,假设这些是相同的,问题可能与您对 __device__ 变量(在本例中为指针)的使用无关,而是与您在内核中的使用有关 new 这肯定有分配限制。

这些限制和行为与内核 mallocwhat is described in the programming guide 相同。特别是,默认限制为 8MB,如果您需要更多(在 "device heap" 中),您必须使用 CUDA 运行time API 调用显式提高限制。

在这些情况下,一个有用的错误检查是检查 newmalloc 返回的指针是否为 NULL,这表明分配失败。如果您未能进行该检查,但随后仍尝试使用指针,那么您将 运行 陷入 post.

中所述的麻烦