如何使用 MAGMA 和 NVIDIA GPU 卡代替 CPU LAPACKE 来逆大矩阵
How to use MAGMA with NVIDIA GPU card instead of CPU LAPACKE to inverse large matrix
我需要对大型矩阵求逆,我想修改我当前的 LAPACKE 版本例程,以便利用 GPU NVIDIA 卡的强大功能。
确实,我的 LAPACKE 例程适用于相对较小的矩阵,但不适用于大型矩阵。
下面是这个 LAPACKE 例程的实现:
#include <mkl.h>
// Passing Matrixes by Reference
void matrix_inverse_lapack(vector<vector<double>> const &F_matrix, vector<vector<double>> &F_output) {
// Index for loop and arrays
int i, j, ip, idx;
// Size of F_matrix
int N = F_matrix.size();
int *IPIV = new int[N];
// Output Diagonal block
double *diag = new double[N];
for (i = 0; i<N; i++){
for (j = 0; j<N; j++){
idx = i*N + j;
arr[idx] = F_matrix[i][j];
}
}
// LAPACKE routines
int info1 = LAPACKE_dgetrf(LAPACK_ROW_MAJOR, N, N, arr, N, IPIV);
int info2 = LAPACKE_dgetri(LAPACK_ROW_MAJOR, N, arr, N, IPIV);
for (i = 0; i<N; i++){
for (j = 0; j<N; j++){
idx = i*N + j;
F_output[i][j] = arr[idx];
}
}
delete[] IPIV;
delete[] arr;
}
with 被这样调用来逆 CO_CL 矩阵:
matrix_inverse_lapack(CO_CL, CO_CL);
CO_CL 定义为:
vector<vector<double>> CO_CL(lsize*(2*Dim_x+Dim_y), vector<double>(lsize*(2*Dim_x+Dim_y), 0));
在我的案例中,如何使用 MAGMA for NVIDIA 代替 LAPACKE 来求逆矩阵?
更新 1: 我已经下载 magma-2.6.1
首先,我必须修改原始的 Makefile :
CXX = icpc -std=c++11 -O3 -xHost
CXXFLAGS = -Wall -c -I${MKLROOT}/include -I/opt/intel/oneapi/compiler/latest/linux/compiler/include -qopenmp -qmkl=parallel
LDFLAGS = -L${MKLROOT}/lib -Wl,-rpath,${MKLROOT}/lib -Wl,-rpath,${MKLROOT}/../compiler/lib -qopenmp -qmkl
SOURCES = main_intel.cpp XSAF_C_intel.cpp
EXECUTABLE = main_intel.exe
我在 magma-2.6.1
中没有看到 mkl
headers : nvcc
和 MKL
是兼容的吗?
尝试使用 magma sgetri gpu
- 单精度逆矩阵,GPU
界面。
此函数以单精度计算 m × m 的逆 A^−1
矩阵 A.
magma_ssetmatrix ( m, m, a,m, d_a ,m, queue ); // copy a -> d_a
magmablas_slacpy ( MagmaFull ,m,m,d_a ,m,d_r ,m, queue ); // d_a - >d_r
// find the inverse matrix : d_a *X=I using the LU factorization
// with partial pivoting and row interchanges computed by
// magma_sgetrf_gpu ; row i is interchanged with row piv (i);
// d_a -mxm matrix ; d_a is overwritten by the inverse
gpu_time = magma_sync_wtime ( NULL );
magma sgetrf gpu( m, m, d a, m, piv, &info);
magma sgetri gpu(m,d a,m,piv,dwork,ldwork,&info);
official documentation of NVIDIA例子比较多,你也可以看看:
我需要对大型矩阵求逆,我想修改我当前的 LAPACKE 版本例程,以便利用 GPU NVIDIA 卡的强大功能。
确实,我的 LAPACKE 例程适用于相对较小的矩阵,但不适用于大型矩阵。
下面是这个 LAPACKE 例程的实现:
#include <mkl.h>
// Passing Matrixes by Reference
void matrix_inverse_lapack(vector<vector<double>> const &F_matrix, vector<vector<double>> &F_output) {
// Index for loop and arrays
int i, j, ip, idx;
// Size of F_matrix
int N = F_matrix.size();
int *IPIV = new int[N];
// Output Diagonal block
double *diag = new double[N];
for (i = 0; i<N; i++){
for (j = 0; j<N; j++){
idx = i*N + j;
arr[idx] = F_matrix[i][j];
}
}
// LAPACKE routines
int info1 = LAPACKE_dgetrf(LAPACK_ROW_MAJOR, N, N, arr, N, IPIV);
int info2 = LAPACKE_dgetri(LAPACK_ROW_MAJOR, N, arr, N, IPIV);
for (i = 0; i<N; i++){
for (j = 0; j<N; j++){
idx = i*N + j;
F_output[i][j] = arr[idx];
}
}
delete[] IPIV;
delete[] arr;
}
with 被这样调用来逆 CO_CL 矩阵:
matrix_inverse_lapack(CO_CL, CO_CL);
CO_CL 定义为:
vector<vector<double>> CO_CL(lsize*(2*Dim_x+Dim_y), vector<double>(lsize*(2*Dim_x+Dim_y), 0));
在我的案例中,如何使用 MAGMA for NVIDIA 代替 LAPACKE 来求逆矩阵?
更新 1: 我已经下载 magma-2.6.1
首先,我必须修改原始的 Makefile :
CXX = icpc -std=c++11 -O3 -xHost
CXXFLAGS = -Wall -c -I${MKLROOT}/include -I/opt/intel/oneapi/compiler/latest/linux/compiler/include -qopenmp -qmkl=parallel
LDFLAGS = -L${MKLROOT}/lib -Wl,-rpath,${MKLROOT}/lib -Wl,-rpath,${MKLROOT}/../compiler/lib -qopenmp -qmkl
SOURCES = main_intel.cpp XSAF_C_intel.cpp
EXECUTABLE = main_intel.exe
我在 magma-2.6.1
中没有看到 mkl
headers : nvcc
和 MKL
是兼容的吗?
尝试使用 magma sgetri gpu
- 单精度逆矩阵,GPU
界面。
此函数以单精度计算 m × m 的逆 A^−1
矩阵 A.
magma_ssetmatrix ( m, m, a,m, d_a ,m, queue ); // copy a -> d_a
magmablas_slacpy ( MagmaFull ,m,m,d_a ,m,d_r ,m, queue ); // d_a - >d_r
// find the inverse matrix : d_a *X=I using the LU factorization
// with partial pivoting and row interchanges computed by
// magma_sgetrf_gpu ; row i is interchanged with row piv (i);
// d_a -mxm matrix ; d_a is overwritten by the inverse
gpu_time = magma_sync_wtime ( NULL );
magma sgetrf gpu( m, m, d a, m, piv, &info);
magma sgetri gpu(m,d a,m,piv,dwork,ldwork,&info);
official documentation of NVIDIA例子比较多,你也可以看看: