你能以编程方式知道 GPU 中每个块的最大块数和线程数吗？

Can you programmatically know the max blocks and threads per block in a GPU?

我正在编写一个 CUDA 程序，它可能运行在许多不同的 GPU 上。我想知道 CUDA 是否提供了一些从代码（运行时间或编译时间）读取当前 GPU 能力的方法，这意味着单个块可以包含的线程数，以及最大块数，这样我就可以调整内核的启动以优化使用所有资源。

我知道这听起来像是一个愚蠢的问题，但我在网上找不到任何答案。

奖金问题，如果不可能：我看到 here 有人说他们知道 Jetson TX1

2 SM’s - each with 128 cores. I read that per SM (which I understand there are 2) there can be a maximum of 16 active blocks, and 64 active warps (or 2048 active threads).

如何找到给定 GPU 的此信息？

我猜 cudaGetDeviceProperties 似乎是你要找的东西。