如何使用 MAGMA 和 NVIDIA GPU 卡代替 CPU LAPACKE 来逆大矩阵

How to use MAGMA with NVIDIA GPU card instead of CPU LAPACKE to inverse large matrix

我需要对大型矩阵求逆,我想修改我当前的 LAPACKE 版本例程,以便利用 GPU NVIDIA 卡的强大功能。

确实,我的 LAPACKE 例程适用于相对较小的矩阵,但不适用于大型矩阵。

下面是这个 LAPACKE 例程的实现:

 #include <mkl.h>

// Passing Matrixes by Reference
void matrix_inverse_lapack(vector<vector<double>> const &F_matrix, vector<vector<double>> &F_output) {

  // Index for loop and arrays
  int i, j, ip, idx;

  // Size of F_matrix
  int N = F_matrix.size();

  int *IPIV = new int[N];

 // Output Diagonal block
  double *diag = new double[N];

  for (i = 0; i<N; i++){
    for (j = 0; j<N; j++){
      idx = i*N + j;
      arr[idx] = F_matrix[i][j];
    }
  }

  // LAPACKE routines
  int info1 = LAPACKE_dgetrf(LAPACK_ROW_MAJOR, N, N, arr, N, IPIV);
  int info2 = LAPACKE_dgetri(LAPACK_ROW_MAJOR, N, arr, N, IPIV);

 for (i = 0; i<N; i++){
    for (j = 0; j<N; j++){
      idx = i*N + j;
      F_output[i][j] = arr[idx];
    }
  }

  delete[] IPIV;
  delete[] arr;
}

with 被这样调用来逆 CO_CL 矩阵:

matrix_inverse_lapack(CO_CL, CO_CL);

CO_CL 定义为:

vector<vector<double>> CO_CL(lsize*(2*Dim_x+Dim_y), vector<double>(lsize*(2*Dim_x+Dim_y), 0));

在我的案例中,如何使用 MAGMA for NVIDIA 代替 LAPACKE 来求逆矩阵?

更新 1: 我已经下载 magma-2.6.1 首先,我必须修改原始的 Makefile :

CXX = icpc -std=c++11 -O3 -xHost
CXXFLAGS = -Wall -c -I${MKLROOT}/include -I/opt/intel/oneapi/compiler/latest/linux/compiler/include -qopenmp -qmkl=parallel
LDFLAGS = -L${MKLROOT}/lib -Wl,-rpath,${MKLROOT}/lib -Wl,-rpath,${MKLROOT}/../compiler/lib -qopenmp -qmkl
SOURCES = main_intel.cpp XSAF_C_intel.cpp
EXECUTABLE = main_intel.exe

我在 magma-2.6.1 中没有看到 mkl headers : nvccMKL 是兼容的吗?

尝试使用 magma sgetri gpu - 单精度逆矩阵,GPU 界面。 此函数以单精度计算 m × m 的逆 A^−1 矩阵 A.

magma_ssetmatrix ( m, m, a,m, d_a ,m, queue ); // copy a -> d_a
magmablas_slacpy ( MagmaFull ,m,m,d_a ,m,d_r ,m, queue ); // d_a - >d_r
// find the inverse matrix : d_a *X=I using the LU factorization
// with partial pivoting and row interchanges computed by
// magma_sgetrf_gpu ; row i is interchanged with row piv (i);
// d_a -mxm matrix ; d_a is overwritten by the inverse
gpu_time = magma_sync_wtime ( NULL );
magma sgetrf gpu( m, m, d a, m, piv, &info);
magma sgetri gpu(m,d a,m,piv,dwork,ldwork,&info);

official documentation of NVIDIA例子比较多,你也可以看看: