cuBLAS 中的主要维度是否允许访问任何子矩阵？

Question

我正在尝试理解 cuBLAS 中主要维度的概念。提到 lda 必须始终大于或等于矩阵中的行数。

如果我有一个 100x100 矩阵 A 并且我想访问 A(90:99, 0:99)，cublasSetMatrix 的参数是什么？ lda 指定同一列中元素之间的行数（在本例中为 100），但我在哪里指定 90？我只能通过调整*A.

看到一个方法

函数定义为：

cublasStatus_t cublasSetMatrix(int rows, int cols, int elemSize, const void *A, int lda, void *B, int ldb)

而且我也猜测，在给定长度限制的情况下，我无法传输 5x5 矩阵右下角的 3x3 部分。

Answer 1

你必须 "adjust *A"，正如你所说的那样。提供给该函数的指针必须是相应子矩阵的起始条目。

你没有说你的矩阵 A 实际上是输入矩阵还是输出矩阵，但这在概念上应该不会有太大变化。

假设您有以下代码：

// The matrix in host memory
int rowsA = 100;
int colsA = 100;
float *A = new float[rowsA*colsA];

// Fill A with values
...

// The sub-matrix that should be copied to the device.
// The minimum index is INCLUSIVE
// The maximum index is EXCLUSIVE
int minRowA = 0;
int maxRowA = 100;
int minColA = 90;
int maxColA = 100;
int rowsB = maxRowA-minRowA;
int colsB = maxColA-minColA;

// Allocate the device matrix
float *dB = nullptr;
cudaMalloc(&dB, rowsB * colsB * sizeof(float));

然后，对于 cublasSetMatrix 调用，您必须计算源矩阵的起始元素：

float *sourceA = A + (minRowA + minColA * rowsA);
cublasSetMatrix(rowsB, colsB, sizeof(float), sourceA, rowsA, dB, rowsB);

这就是您要求的 90 发挥作用的地方：它是源指针计算中的 minColA。

cuBLAS 中的主要维度是否允许访问任何子矩阵？

Does the leading dimension in cuBLAS allow for accessing any submatrix?

cuda

blas

cublas