CUDA cuBlasGetmatrix / cublasSetMatrix 失败 |参数解释

Question

我试图将以列优先格式存储的矩阵 [1 2 3 4 ; 5 6 7 8 ; 9 10 11 12 ] 复制为 x，方法是首先使用 [= 将其复制到 NVIDIA GPU d_x 中的矩阵14=]，然后使用 cublasGetMatrix().

将 d_x 复制到 y

#include<stdio.h>
#include"cublas_v2.h"

int main()
{
    cublasHandle_t hand;
    float x[][3] = { {1,5,9} , {2,6,10} , {3,7,11} , {4,8,12} };
    float y[4][3] = {};
    float *d_x;

    printf("X\n");
    for( int i=0 ; i<4 ; i++ )
    {
        printf("Row %i:",i+1);
        for( int j = 0 ; j<3 ; j++ )
        {
            printf(" %f",x[i][j]);
        }
        putchar('\n');
    }
    printf("Y\n");
    for( int i=0 ; i<4 ; i++ )
    {
        printf("Row %i:",i+1);
        for( int j = 0 ; j<3 ; j++ )
        {
            printf(" %f",y[i][j]);
        }
        putchar('\n');
    }

    cublasCreate( &hand );
    cudaMalloc( &d_x,sizeof(d_x) );
    cublasSetMatrix( 3,4,sizeof(float),x,3,d_x,3 );
    cublasGetMatrix( 3,4,sizeof(float),d_x,3,y,3 );

    printf("X\n");
    for( int i=0 ; i<4 ; i++ )
    {
        printf("Row %i:",i+1);
        for( int j = 0 ; j<3 ; j++ )
        {
            printf(" %f",x[i][j]);
        }
        putchar('\n');
    }
    printf("Y\n");
    for( int i=0 ; i<4 ; i++ )
    {
        printf("Row %i:",i+1);
        for( int j = 0 ; j<3 ; j++ )
        {
            printf(" %f",y[i][j]);
        }
        putchar('\n');
    }


    cudaFree( d_x );
    cublasDestroy( hand );
    return 0;
}

复制后的输出显示 y 填充了 0s.

是否有任何 cublas 函数调用失败？

Or/And

是否向 cublas 函数传递了错误的参数？

此外，请解释函数中每个参数的用途。

在 Fedora 21 上使用 GeForce GTX 650 和 CUDA 6.5 x86_64。

Answer 1

您的代码中唯一实际的问题是：

cudaMalloc( &d_x,sizeof(d_x) );

sizeof(d_x) 只是一个指针的大小。您可以这样修复它：

cudaMalloc( &d_x,sizeof(x) );

如果您想知道 CUBLAS API 调用是否失败，那么您应该检查 API 调用的 return 代码：

cublasStatus_t res = cublasSetMatrix( 3,4,sizeof(float),x,3,d_x,3 );

关于参数的描述，你说的都是正确的（除了与d_x相关的分配错误）。所以不清楚你需要哪一个的描述，但它们都在 documentation.

中描述。

CUDA API 调用（如 cudaMalloc）也 return 一个错误代码，所以你也应该检查那些。任何时候您在使用 CUDA 代码时遇到问题，使用 proper cuda error checking 是个好主意。您还可以使用 cuda-memcheck 运行您的代码作为快速测试。

CUDA cuBlasGetmatrix / cublasSetMatrix 失败 |参数解释

CUDA cuBlasGetmatrix / cublasSetMatrix fails | Explanation of arguments

cuda

gpu

gpgpu

cublas