将未对齐的缓冲区提供给 MTLBuffer 是否安全?

Is it safe to feed unaligned buffers to MTLBuffer?

当尝试使用 Metal 将像素缓冲区从内存快速绘制到屏幕时,我们使用 MTLDevice.makeBuffer(bytesNoCopy:..) 创建 MTLBuffer 对象,以允许 GPU 直接从内存中读取像素而无需复制它。共享内存确实是实现良好像素传输性能的必备条件。

要注意的是 makeBuffer 需要页对齐内存地址和页对齐 length。这些要求不仅在文档中——它们还使用运行时断言强制执行。

我正在编写的代码必须处理各种传入的分辨率和像素格式,有时我会遇到未对齐的缓冲区或未对齐的长度。在研究了这个之后,我发现了一个 hack,它允许我为这些实例使用共享内存。

基本上我所做的是将未对齐的缓冲区地址向下舍入到最近的页面边界,并使用 makeTexture 中的 offset 参数来确保 GPU 从正确的位置开始读取。然后我将 length 舍入到最接近的页面大小。显然,内存将是有效的(因为分配只能发生在页面边界上),我认为假设 GPU 没有写入或破坏该内存是安全的。

这是我用来从未对齐的缓冲区分配共享缓冲区的代码:

extension MTLDevice {
    func makeTextureFromUnalignedBuffer(textureDescriptor : MTLTextureDescriptor, bufferPtr : UnsafeMutableRawPointer, bufferLength : UInt, bytesPerRow : Int) -> MTLTexture? {

        var calculatedBufferLength = bufferLength
        let pageSize = UInt(getpagesize())
        let pageSizeBitmask = UInt(getpagesize()) - 1

        let alignedBufferAddr = UnsafeMutableRawPointer(bitPattern: UInt(bitPattern: bufferPtr) & ~pageSizeBitmask)
        let offset = UInt(bitPattern: bufferPtr) & pageSizeBitmask

        assert(bytesPerRow % 64 == 0 && offset % 64 == 0, "Supplied bufferPtr and bytesPerRow must be aligned on a 64-byte boundary!")

        calculatedBufferLength += offset

        if (calculatedBufferLength & pageSizeBitmask) != 0 {
            calculatedBufferLength &= ~(pageSize - 1)
            calculatedBufferLength += pageSize
        }

        let buffer = self.makeBuffer(bytesNoCopy: alignedBufferAddr!, length: Int(calculatedBufferLength), options: .storageModeShared, deallocator: nil)
        return buffer.makeTexture(descriptor: textureDescriptor, offset: Int(offset), bytesPerRow: bytesPerRow)
    }
}

我已经在许多不同的缓冲区上对此进行了测试,它似乎运行良好(仅在 iOS 上测试过,未在 macOS 上测试过)。 我的问题是:这种方法安全吗?这行不通有什么明显的原因吗?

那么,如果安全的话,为什么一开始就提出这些要求?为什么 API 不直接为我们做这件事?

我已经针对这个问题提交了Apple TSI(技术支持事件),答案基本是是的,很安全。如果有人感兴趣,这里是确切的回复:

After discussing your approach with engineering we concluded that it was valid and safe. Some noteworthy quotes:

“The framework shouldn’t care about the fact that the user doesn’t own the entire page, because it shouldn’t ever read before the offset where the valid data begins.”

“It really shouldn’t [care], but in general if the developer can use page-allocators rather than malloc for their incoming images, that would be nice.”

As to why the alignment constraints/assertions are in place:

“Typically mapping memory you don’t own into another address space is a bit icky, even if it works in practice. This is one reason why we required mapping to be page aligned, because the hardware really is mapping (and gaining write access) to the entire page.”