如何在金属着色器中执行 2 个向量的外积？

Question

所以我正在 GPU 上 iOS 中开发运行的神经网络，因此我需要使用矩阵符号（为了反向传播错误）能够执行外部产品2 个向量。


// Outer product of vector A and Vector B
kernel void outerProduct(const device float *inVectorA [[ buffer(0) ]],
                         const device float *inVectorB [[ buffer(1) ]],
                         device float *outVector [[ buffer(2) ]],
                         uint id [[ thread_position_in_grid ]]) {
    
    outVector[id] = inVectorA[id] * inVectorB[***?***]; // How to find this position on the thread group (or grid)?
}

Answer 1

您使用的 thread_position_in_grid 不正确。如果你调度的是二维网格，它应该是uint2或ushort2，否则它只会得到x坐标。参考Metal Shading Language specification.

中的table5.7

我不确定我们在谈论哪个外积，但我认为输出应该是一个矩阵。如果您线性存储它，那么计算 outVector 的代码应如下所示：

kernel void outerProduct(const device float *inVectorA [[ buffer(0) ]],
                         const device float *inVectorB [[ buffer(1) ]],
                         uint2 gridSize [[ threads_per_grid ]],
                         device float *outVector [[ buffer(2) ]],
                         uint2 id [[ thread_position_in_grid ]]) {
    
    outVector[id.y * gridSize.x + id.x] = inVectorA[id.x] * inVectorB[id.y];
}

此外，如果您要分派大小正好为 inVectorAxinVectorB 的网格，您可以在内核参数上使用属性 threads_per_grid 来查明网格有多大.

或者，您可以将向量的大小与向量本身一起传递。

如何在金属着色器中执行 2 个向量的外积？

How to perform Outer product of 2 vectors in Metal shaders?

shader

swift

metal