将 CUDA 输出 array/surface 绑定到 ManagedCUDA 中的 GL 纹理

Bind CUDA output array/surface to GL texture in ManagedCUDA

我目前正在尝试将某种形式的 CUDA 程序输出连接到 GL_TEXTURE_2D 以用于渲染。我不太担心 CUDA 的输出类型(无论是数组还是曲面,我都可以调整程序以适应它)。

所以问题是,我该怎么做? (我当前的代码将输出数组复制到系统内存,并使用 GL.TexImage2D 再次将其上传到 GPU,这显然效率很低——当我禁用这两条代码时,它从每秒大约 300 次内核执行开始高达 400)

我已经有了一些测试代码,至少可以将 GL 纹理绑定到 CUDA,但我什至无法从中获取设备指针...

ctx = CudaContext.CreateOpenGLContext(CudaContext.GetMaxGflopsDeviceId(), CUCtxFlags.SchedAuto);

uint textureID = (uint)GL.GenTexture(); //create a texture in GL
GL.TexParameter(TextureTarget.Texture2D, TextureParameterName.TextureMinFilter, (int)TextureMinFilter.Linear);
GL.TexParameter(TextureTarget.Texture2D, TextureParameterName.TextureMagFilter, (int)TextureMagFilter.Linear);
GL.TexImage2D(TextureTarget.Texture2D, 0, PixelInternalFormat.Rgba, width, height, 0, OpenTK.Graphics.OpenGL.PixelFormat.Rgba, PixelType.UnsignedByte, null); //allocate memory for the texture in GL

CudaOpenGLImageInteropResource resultImage = new CudaOpenGLImageInteropResource(textureID, CUGraphicsRegisterFlags.WriteDiscard, CudaOpenGLImageInteropResource.OpenGLImageTarget.GL_TEXTURE_2D, CUGraphicsMapResourceFlags.WriteDiscard); //using writediscard because the CUDA kernel will only write to this texture

//then, as far as I understood the ManagedCuda example, I have to do the following when I call my kernel
//(done without a CudaGraphicsInteropResourceCollection because I only have one item)
resultImage.Map();
var ptr = resultImage.GetMappedPointer(); //this crashes
kernelSample.Run(ptr); //pass the pointer to the kernel so it knows where to write
resultImage.UnMap();

试图获取指针时抛出以下异常:

ErrorNotMappedAsPointer: This indicates that a mapped resource is not available for access as a pointer.

我需要做什么来解决这个问题?

即使这个异常可以解决,我将如何解决我问题的另一部分;也就是说,我如何在我的内核中使用获取的指针?我可以为此使用表面吗?将其作为任意数组访问(指针运算)?

编辑: 查看 this 示例,显然我什至不需要每次调用内核并调用渲染函数时都映射资源。但这将如何转化为 ManangedCUDA?

感谢我找到的示例,我能够将其转换为 ManagedCUDA(在浏览源代码和摆弄之后),我很高兴地宣布,这确实提高了我每秒大约 300 个样本到 400:)

显然需要使用 3D 数组(我在 ManagedCUDA 中没有看到使用 2D 数组的任何重载)但这并不重要 - 我只使用 3D array/texture 正好是 1深

id = GL.GenTexture();
GL.BindTexture(TextureTarget.Texture3D, id);
GL.TexParameter(TextureTarget.Texture3D, TextureParameterName.TextureMinFilter, (int)TextureMinFilter.Linear);
GL.TexParameter(TextureTarget.Texture3D, TextureParameterName.TextureMagFilter, (int)TextureMagFilter.Linear);
GL.TexImage3D(TextureTarget.Texture3D, 0, PixelInternalFormat.Rgba, width, height, 1, 0, OpenTK.Graphics.OpenGL.PixelFormat.Bgra, PixelType.UnsignedByte, IntPtr.Zero); //allocate memory for the texture but do not upload anything

CudaOpenGLImageInteropResource resultImage = new CudaOpenGLImageInteropResource((uint)id, CUGraphicsRegisterFlags.SurfaceLDST, CudaOpenGLImageInteropResource.OpenGLImageTarget.GL_TEXTURE_3D, CUGraphicsMapResourceFlags.WriteDiscard);
resultImage.Map();
CudaArray3D mappedArray = resultImage.GetMappedArray3D(0, 0);
resultImage.UnMap();

CudaSurface surfaceResult = new CudaSurface(kernelSample, "outputSurface", CUSurfRefSetFlags.None, mappedArray); //nothing needs to be done anymore - this call connects the 3D array from the GL texture to a surface reference in the kernel

内核代码: 表面输出表面;

__global__ void Sample() {
    ...
    surf3Dwrite(output, outputSurface, pixelX, pixelY, 0);
}