cudaOccupancyMaxActiveBlocksPerMultiprocessor 未定义

Question

我正在尝试学习 cuda 并以有效的方式使用它。我从 nvidia 的网站上找到了一段代码，它告诉我们可以了解我们应该使用多少块大小才能最有效地使用设备。代码如下：

#include <iostream>

// Device code
__global__ void MyKernel(int *d, int *a, int *b)
{
    int idx = threadIdx.x + blockIdx.x * blockDim.x;
    d[idx] = a[idx] * b[idx];
}

// Host code
int main()
{
    int numBlocks;        // Occupancy in terms of active blocks
    int blockSize = 32;

    // These variables are used to convert occupancy to warps
    int device;
    cudaDeviceProp prop;
    int activeWarps;
    int maxWarps;

    cudaGetDevice(&device);
    cudaGetDeviceProperties(&prop, device);

    cudaOccupancyMaxActiveBlocksPerMultiprocessor(
    &numBlocks,
    MyKernel,
    blockSize,
    0);

    activeWarps = numBlocks * blockSize / prop.warpSize;
    maxWarps = prop.maxThreadsPerMultiProcessor / prop.warpSize;

    std::cout << "Occupancy: " << (double)activeWarps / maxWarps * 100 << "%" << std::endl;

    return 0;
}

但是我编译的时候出现如下错误:

编译行：

nvcc ben_deneme2.cu -arch=sm_35 -rdc=true -lcublas -lcublas_device -lcudadevrt -o my

错误：

ben_deneme2.cu(25): error: identifier "cudaOccupancyMaxActiveBlocksPerMultiprocessor" is undefined

1 error detected in the compilation of "/tmp/tmpxft_0000623d_00000000-8_ben_deneme2.cpp1.ii".

我是否应该为此添加一个库，尽管我在 Internet 上找不到该库的名称？还是我做错了什么？提前致谢

Answer 1

cudaOccupancyMaxActiveBlocksPerMultiprocessor函数包含在 CUDA 6.5 中。如果您安装了以前版本的 CUDA，则无法使用该功能，例如，它不适用于 CUDA 5.5。

如果您想使用该功能，您必须将您的 CUDA 版本至少更新到 6.5。

使用旧版本的人通常使用 Cuda 占用计算器。

One common heuristic used to choose a good block size is to aim for high occupancy, which is the ratio of the number of active warps per multiprocessor to the maximum number of warps that can be active on the multiprocessor at once. -- CUDA Pro Tip: Occupancy API Simplifies Launch Configuration

cudaOccupancyMaxActiveBlocksPerMultiprocessor 未定义

cudaOccupancyMaxActiveBlocksPerMultiprocessor is undefined

c++

performance

cuda