CUDA 中动态分配的共享内存。执行配置

Question

by thisNvidia 是什么意思？

Ns is of type size_t and specifies the number of bytes in shared memory that is dynamically allocated per block for this call in addition to the statically allocated memory; this dynamically allocated memory is used by any of the variables declared as an external array as mentioned in __shared__; Ns is an optional argument which defaults to 0;

我的 GPU 中共享内存的大小是 48kB。例如我想同时运行 4 个内核，每个内核使用 12kB 的共享内存。

为了做到这一点，我应该这样调用 kernek

kernel<<< gridSize, blockSize, 12 * 1024 >>>();

或者第三个参数应该是 48 * 1024 ?

Answer 1

Ns 以字节为单位。如果你想保留 12kB 的共享内存，你会做 12*1024*1024.

我怀疑你想这样做。 Ns 值为 PER BLOCK。所以它是在设备上执行的每个块的共享内存量。我猜您想围绕 12*1024*1024/number_of_blocks;

行做一些事情

以并发方式启动的内核： 如果如评论中所述，您正在使用流，则内核启动中有第四个输入，即 cuda 流。

如果你想在没有任何共享内存的情况下在另一个流上启动内核，它看起来像：

kernel_name<<<128, 128, 0, mystream>>>(...);

但并发性是一个完全不同的问题。

CUDA 中动态分配的共享内存。执行配置

Dynamically allocated shared memory in CUDA. Execution Configuration

cuda

shared-memory