将推力矢量输入 getrf/getri 时出现问题

Question

继续我的 CUDA 初学者之旅，有人介绍我使用 Thrust，它似乎是一个方便的库，让我免于显式内存（取消）分配的麻烦。

我已经尝试将它与一些 cuBLAS 例程结合使用，例如gemv，通过使用 thrust::raw_pointer_cast(array.data()) 生成指向底层存储的原始指针，然后将其提供给例程，它工作得很好。

当前任务是获取矩阵的逆，为此我使用了 getrfBatched 和 getriBatched。来自文档：

cublasStatus_t cublasDgetrfBatched(cublasHandle_t handle,
                                   int n, 
                                   double *Aarray[],
                                   int lda, 
                                   int *PivotArray,
                                   int *infoArray,
                                   int batchSize);

哪里

Aarray - device - array of pointers to <type> array

自然地，我想我可以使用另一层推力向量来表达这个指针数组，并再次将其原始指针提供给 cuBLAS，所以这就是我所做的：

void test()
{
    thrust::device_vector<double> in(4);
    in[0] = 1;
    in[1] = 3;
    in[2] = 2;
    in[3] = 4;
    cublasStatus_t stat;
    cublasHandle_t handle;
    stat = cublasCreate(&handle);
    thrust::device_vector<double> out(4, 0);
    thrust::device_vector<int> pivot(2, 0);
    int info = 0;
    thrust::device_vector<double*> in_array(1);
    in_array[0] = thrust::raw_pointer_cast(in.data());
    thrust::device_vector<double*> out_array(1);
    out_array[0] = thrust::raw_pointer_cast(out.data());
    stat = cublasDgetrfBatched(handle, 2,
        (double**)thrust::raw_pointer_cast(in_array.data()), 2,
        thrust::raw_pointer_cast(pivot.data()), &info, 1);
    stat = cublasDgetriBatched(handle, 2,
        (const double**)thrust::raw_pointer_cast(in_array.data()), 2,
        thrust::raw_pointer_cast(pivot.data()),
        (double**)thrust::raw_pointer_cast(out_array.data()), 2, &info, 1);
}

执行时，stat表示CUBLAS_STATUS_SUCCESS (0)，info表示0（执行成功），但如果我尝试访问in的元素, pivot 或 out 用标准的括号表示法，我打了一个 thrust::system::system_error。在我看来，相应的内存以某种方式损坏了。

我在这里遗漏了什么明显的东西吗？

Answer 1

cublas<t>getrfBatched 的 documentation 表示 infoArray 参数应为指向设备内存的指针。

相反，您传递了一个指向主机内存的指针：

int info = 0;
...
stat = cublasDgetrfBatched(handle, 2,
    (double**)thrust::raw_pointer_cast(in_array.data()), 2,
    thrust::raw_pointer_cast(pivot.data()), &info, 1);
                                            ^^^^^

如果您运行您的代码使用 cuda-memcheck（在我看来，这总是一个好习惯，任何时候您在使用 CUDA 代码时遇到问题，之前向别人寻求帮助）你会收到 "invalid global write of size 4" 的错误。这是因为 cublasDgetrfBatched() 启动的内核试图使用您提供的普通主机指针将 info 数据写入设备内存，这在 CUDA 中始终是非法的。

出于性能原因，CUBLAS 本身不会捕获此类错误。然而，在某些情况下，thrust API 使用更严格的同步和错误检查。因此，在这个错误之后使用 thrust 代码报告错误，即使错误与 thrust 无关（它是以前内核启动的异步报告错误）。

解决方法很简单；为 info:

提供设备存储

$ cat t329.cu
#include <thrust/device_vector.h>
#include <cublas_v2.h>
#include <iostream>

void test()
{
    thrust::device_vector<double> in(4);
    in[0] = 1;
    in[1] = 3;
    in[2] = 2;
    in[3] = 4;
    cublasStatus_t stat;
    cublasHandle_t handle;
    stat = cublasCreate(&handle);
    thrust::device_vector<double> out(4, 0);
    thrust::device_vector<int> pivot(2, 0);
    thrust::device_vector<int> info(1, 0);
    thrust::device_vector<double*> in_array(1);
    in_array[0] = thrust::raw_pointer_cast(in.data());
    thrust::device_vector<double*> out_array(1);
    out_array[0] = thrust::raw_pointer_cast(out.data());
    stat = cublasDgetrfBatched(handle, 2,
        (double**)thrust::raw_pointer_cast(in_array.data()), 2,
        thrust::raw_pointer_cast(pivot.data()), thrust::raw_pointer_cast(info.data()), 1);
    stat = cublasDgetriBatched(handle, 2,
        (const double**)thrust::raw_pointer_cast(in_array.data()), 2,
        thrust::raw_pointer_cast(pivot.data()),
        (double**)thrust::raw_pointer_cast(out_array.data()), 2, thrust::raw_pointer_cast(info.data()), 1);
    for (int i = 0; i < 4; i++) {
      double test = in[i];
      std::cout << test << std::endl;
      }
}


int main(){

  test();
}
$ nvcc -o t329 t329.cu -lcublas
t329.cu(12): warning: variable "stat" was set but never used

$ cuda-memcheck ./t329
========= CUDA-MEMCHECK
3
0.333333
4
0.666667
========= ERROR SUMMARY: 0 errors
$

您会注意到上述代码中的此更改适用于两个 cublas 调用的用法，因为 infoArray 参数对两者具有相同的期望。

将推力矢量输入 getrf/getri 时出现问题

Problem feeding Thrust vector into getrf/getri

c++

cuda

thrust

cublas