为什么不允许 cudaLaunchCooperativeKernel() 返回？

Question

所以我使用的是 GTX 1050，计算能力为 6.1，CUDA 11.0。我需要在我的程序中使用网格同步，所以需要 cudaLaunchCooperativeKernel()。我已经检查了我的设备查询，因此 GPU 确实支持协作组。我无法执行以下功能

 extern "C" __global__ void test(int x) {
    if (x) {
       printf("%d", x);
       if (threadIdx.x == 0)
          test<<<1, 1>>>(--x);
    }
}

调用后，

cudaLaunchCooperativeKernel((void *)test, 1, 1, (void **) (&x));

出现错误 'operation not permitted'（代码为 800）。现在，当设备不支持协作组时返回（在这种情况下不支持）。那么，是什么导致了这个问题？

Answer 1

您的内核使用动态并行。但是，通过 cudaLaunchCooperativeKernel

启动的内核不允许动态并行

Why is cudaLaunchCooperativeKernel() returning not permitted?