用于 coreML 自定义层的 MTLTexture 数组的金属数据结构

Question

我对 CoreML 自定义层的 MTLTexture 数组感到困惑。在我的mlmodel中，自定义层的输入MTLTexture有32个通道，输出有8个通道。 MTLTexture 的数据类型是 16 位浮点数，即半浮点数。所以输入 texture_array 由 8 个切片组成，输出由 2 个切片组成。

func encode(commandBuffer: MTLCommandBuffer, inputs: [MTLTexture], outputs: [MTLTexture]) throws {
    print(#function, inputs.count, outputs.count)
    if let encoder = commandBuffer.makeComputeCommandEncoder() {
        for i in 0..<inputs.count {
            encoder.setTexture(inputs[i], index: 0)
            encoder.setTexture(outputs[i], index: 1)
            encoder.dispatch(pipeline: psPipeline, texture: inputs[i])
            encoder.endEncoding()
        }
    }
}

在我的计算内核函数中

kernel void pixelshuffle(
    texture2d_array<half, access::read> inTexture [[texture(0)]],
    texture2d_array<half, access::write> outTexture [[texture(1)]],
    ushort3 gid [[thread_position_in_grid]])
{
    if (gid.x >= inTexture.get_width() || gid.y >= inTexture.get_height()
        || gid.z>=inTexture.get_array_size()){
        return;
    }
    const half4 src = half4(inTexture.read(gid.xy, gid.z));
    //do other things
}
)

如果输入输出纹理数组为[C][H][W]，对于gid=(0,0,0)，src.rgba存储在哪些通道，rgba是什么在其通道中的坐标？

是src.r[0][0][0],src.g[1][0][0],src.b[2][0][0] , src.a [3][0][0] ？要么是 src.r [0][0][0], src.g[0][0][1], src.b [0][0][2], src.a [0][0][3] ?

以及如何在编码函数中获取输入纹理的原始数据并将其打印出来？

Answer 1

在您的计算内核中，src 包含纹理中单个像素的 RGBA 值，每个值都是一个 16 位浮点数。

贴图的宽为W，贴图的高为H，贴图切片为C，每个切片有4个通道

所以纹理中的切片数等于C/4，gid.z从0到floor((C + 3)/4)。

（尽管这也取决于您的 encoder.dispatch(pipeline:, texture:) 函数的作用，因为这似乎不是 MTLComputeCommandEncoder 上的标准方法。）

表示src.r是切片中的第一个通道，.g是切片中的第二个通道，.b是第三个通道，.a切片中的第四个通道。第一个切片有通道 0-3，第二个切片有通道 4-7，依此类推。

所以您的第一个猜测是正确的：

src.r [0][0][0], src.g[1][0][0], src.b [2][0][0], src.a [3][0][0]

另请注意，我写了一篇关于 Core ML 中的自定义内核的博客 post，可能会有用：http://machinethink.net/blog/coreml-custom-layers/

用于 coreML 自定义层的 MTLTexture 数组的金属数据结构

metal data structure of MTLTexture array for coreML's custom layer

coreml