使用金属时无法从 gpu 获取数据
could not get data from gpu when using metal
我写了一个简单的代码如下,检查GPU是否可以做一些计算工作。
id<MTLDevice> device = MTLCreateSystemDefaultDevice();
NSLog(@"Device: %@", [device name]);
id<MTLCommandQueue> commandQueue = [device newCommandQueue];
NSError * ns_error = nil;
id<MTLLibrary>defaultLibrary = [device newLibraryWithFile:@"/Users/i/tmp/tmp6/s.metallib" error:&ns_error];
// Buffer for storing encoded commands that are sent to GPU
id<MTLCommandBuffer> commandBuffer = [commandQueue commandBuffer];
// Encoder for GPU commands
id <MTLComputeCommandEncoder> computeCommandEncoder = [commandBuffer computeCommandEncoder];
//set input and output data
float tmpbuf[1000];
float outbuf[1000];
for( int i = 0; i < 1000; i++ )
{
tmpbuf[i] = i;
outbuf[i] = 0;
}
int tmp_length = 100*sizeof(float);
id<MTLBuffer> inVectorBuffer = [device newBufferWithBytes: tmpbuf length: tmp_length options: MTLResourceOptionCPUCacheModeDefault ];
[computeCommandEncoder setBuffer: inVectorBuffer offset: 0 atIndex: 0 ];
id<MTLBuffer> outVectorBuffer = [device newBufferWithBytes: outbuf length: tmp_length options: MTLResourceOptionCPUCacheModeDefault ];
[computeCommandEncoder setBuffer: outVectorBuffer offset: 0 atIndex: 1 ];
//get fuction
id<MTLFunction> newfunc = [ defaultLibrary newFunctionWithName:@"sigmoid" ];
//get pipelinestat
id<MTLComputePipelineState> cpipeline = [device newComputePipelineStateWithFunction: newfunc error:&ns_error ];
[computeCommandEncoder setComputePipelineState:cpipeline ];
//
MTLSize ts= {10, 10, 1};
MTLSize numThreadgroups = {2, 5, 1};
[computeCommandEncoder dispatchThreadgroups:numThreadgroups threadsPerThreadgroup:ts];
[ computeCommandEncoder endEncoding ];
[ commandBuffer commit];
//get data computed by GPU
NSData* outdata = [NSData dataWithBytesNoCopy:[outVectorBuffer contents ] length: tmp_length freeWhenDone:false ];
float final_out[1000];
[outdata getBytes:final_out length:tmp_length];
//In my option, each value of final_out should be 0
for( int i = 0; i < 1000; i++ )
{
printf("%.2f : %.2f\n", tmpbuf[i], final_out[i]);
}
着色器文件,名称s.shader,如下所示,为输出赋值10.0
using namespace metal;
kernel void sigmoid(const device float *inVector [[ buffer(0) ]],
device float *outVector [[ buffer(1) ]],
uint id [[ thread_position_in_grid ]]) {
// This calculates sigmoid for _one_ position (=id) in a vector per call on the GPU
outVector[id] = 10.0;
}
在上面的代码中,我通过变量final_out得到了GPU计算的数据。在我的选项中,final_out 的每个值都应为 10.0,如 s.shader 中所示。但是,final_out 的所有值都是 0。从 GPU 取回数据有问题吗?
谢谢
提交命令缓冲区只是告诉驱动程序开始执行它。如果要回读 CPU 上 GPU 操作的结果,您需要使用 -waitUntilCompleted
阻塞当前线程,或者添加一个块以在命令缓冲区完成时调用 [=11] =]方法。
另一个注意事项:您似乎正在使用存储模式为 Shared
的缓冲区。如果您曾经使用存储模式为 Managed
的缓冲区,您还需要创建一个 blit 命令编码器并使用适当的缓冲区调用 synchronizeResource:
,然后等待它完成如上所述,为了从GPU中复制回结果。
我写了一个简单的代码如下,检查GPU是否可以做一些计算工作。
id<MTLDevice> device = MTLCreateSystemDefaultDevice();
NSLog(@"Device: %@", [device name]);
id<MTLCommandQueue> commandQueue = [device newCommandQueue];
NSError * ns_error = nil;
id<MTLLibrary>defaultLibrary = [device newLibraryWithFile:@"/Users/i/tmp/tmp6/s.metallib" error:&ns_error];
// Buffer for storing encoded commands that are sent to GPU
id<MTLCommandBuffer> commandBuffer = [commandQueue commandBuffer];
// Encoder for GPU commands
id <MTLComputeCommandEncoder> computeCommandEncoder = [commandBuffer computeCommandEncoder];
//set input and output data
float tmpbuf[1000];
float outbuf[1000];
for( int i = 0; i < 1000; i++ )
{
tmpbuf[i] = i;
outbuf[i] = 0;
}
int tmp_length = 100*sizeof(float);
id<MTLBuffer> inVectorBuffer = [device newBufferWithBytes: tmpbuf length: tmp_length options: MTLResourceOptionCPUCacheModeDefault ];
[computeCommandEncoder setBuffer: inVectorBuffer offset: 0 atIndex: 0 ];
id<MTLBuffer> outVectorBuffer = [device newBufferWithBytes: outbuf length: tmp_length options: MTLResourceOptionCPUCacheModeDefault ];
[computeCommandEncoder setBuffer: outVectorBuffer offset: 0 atIndex: 1 ];
//get fuction
id<MTLFunction> newfunc = [ defaultLibrary newFunctionWithName:@"sigmoid" ];
//get pipelinestat
id<MTLComputePipelineState> cpipeline = [device newComputePipelineStateWithFunction: newfunc error:&ns_error ];
[computeCommandEncoder setComputePipelineState:cpipeline ];
//
MTLSize ts= {10, 10, 1};
MTLSize numThreadgroups = {2, 5, 1};
[computeCommandEncoder dispatchThreadgroups:numThreadgroups threadsPerThreadgroup:ts];
[ computeCommandEncoder endEncoding ];
[ commandBuffer commit];
//get data computed by GPU
NSData* outdata = [NSData dataWithBytesNoCopy:[outVectorBuffer contents ] length: tmp_length freeWhenDone:false ];
float final_out[1000];
[outdata getBytes:final_out length:tmp_length];
//In my option, each value of final_out should be 0
for( int i = 0; i < 1000; i++ )
{
printf("%.2f : %.2f\n", tmpbuf[i], final_out[i]);
}
着色器文件,名称s.shader,如下所示,为输出赋值10.0
using namespace metal;
kernel void sigmoid(const device float *inVector [[ buffer(0) ]],
device float *outVector [[ buffer(1) ]],
uint id [[ thread_position_in_grid ]]) {
// This calculates sigmoid for _one_ position (=id) in a vector per call on the GPU
outVector[id] = 10.0;
}
在上面的代码中,我通过变量final_out得到了GPU计算的数据。在我的选项中,final_out 的每个值都应为 10.0,如 s.shader 中所示。但是,final_out 的所有值都是 0。从 GPU 取回数据有问题吗? 谢谢
提交命令缓冲区只是告诉驱动程序开始执行它。如果要回读 CPU 上 GPU 操作的结果,您需要使用 -waitUntilCompleted
阻塞当前线程,或者添加一个块以在命令缓冲区完成时调用 [=11] =]方法。
另一个注意事项:您似乎正在使用存储模式为 Shared
的缓冲区。如果您曾经使用存储模式为 Managed
的缓冲区,您还需要创建一个 blit 命令编码器并使用适当的缓冲区调用 synchronizeResource:
,然后等待它完成如上所述,为了从GPU中复制回结果。