Armadillo SpMat<int> 与 Mat<int> 相比非常慢

Question

我正在尝试在 Armadillo 中使用稀疏矩阵，并且我注意到与使用 Mat<int> 的等效代码相比，使用 SpMat<int> 的访问时间存在显着差异。

描述：

下面是两种方法，除了Method_One使用正则矩阵和Method_Two使用稀疏矩阵外，它们在各个方面都是相同的。

两种方法都采用以下参数：

WS, DS：指向 NN 维数组的指针
WW: 13K [max(WS)]
DD: 1.7 K [max(DS)]
NN: 2.3 M
TT: 50

我正在使用 Visual Studio 2017 将代码编译成 .mexw64 可执行文件，可以从 Matlab 调用。

代码：

void Method_One(int WW, int DD, int TT, int NN, double* WS, double* DS)
{
    Mat<int> WP(WW, TT, fill::zeros); // (13000 x 50) matrix
    Mat<int> DP(DD, TT, fill::zeros); // (1700  x 50) matrix
    Col<int> ZZ(NN, fill::zeros);     // 2,300,000 column vector

    for (int n = 0; n < NN; n++)
    {
        int w_n = (int) WS[n] - 1;
        int d_n = (int) DS[n] - 1;
        int t_n = rand() % TT;

        WP(w_n, t_n)++;
        DP(d_n, t_n)++;
        ZZ(n) = t_n + 1;
    }
    return;
}

void Method_Two(int WW, int DD, int TT, int NN, double* WS, double* DS)
{
    SpMat<int> WP(WW, TT);        // (13000 x 50) matrix
    SpMat<int> DP(DD, TT);        // (1700  x 50) matrix
    Col<int> ZZ(NN, fill::zeros); // 2,300,000 column vector

    for (int n = 0; n < NN; n++)
    {
        int w_n = (int) WS[n] - 1;
        int d_n = (int) DS[n] - 1;
        int t_n = rand() % TT;

        WP(w_n, t_n)++;
        DP(d_n, t_n)++;
        ZZ(n) = t_n + 1;
    }
    return;
}

时间：

我在 Armadillo 中使用 wall_clock 计时器对象对这两种方法进行计时。例如，

wall_clock timer;
timer.tic();
Method_One(WW, DD, TT, NN, WS, DS);
double t = timer.toc();

结果：

Method_One 使用 Mat<int> 的时间已过：0.091 sec
Method_Two 使用 SpMat<int> 的时间已过：30.227 sec（慢了将近 300 倍）

非常感谢对此的任何见解！

更新：

此问题已通过更新的 version (8.100.1) Armadillo 解决。

以下是新结果：

Method_One 使用 Mat<int> 的时间已过：0.141 sec
Method_Two 使用 SpMat<int> 的时间已过：2.127 sec（慢 15 倍，这是可以接受的！）

感谢 Conrad 和 Ryan。

Answer 1

稀疏矩阵以压缩格式存储 (CSC)。每次将非零元素插入稀疏矩阵时，都必须更新整个内部表示。这很费时间。

使用batch constructors构造稀疏矩阵要快得多。

Answer 2

正如 hbrerkere 已经提到的，问题源于矩阵的值以打包格式 (CSC) 存储的事实，这使得

非常耗时

查找已存在条目的索引：根据列条目是否按行索引排序，您需要线性搜索或二进制搜索。
插入一个以前为零的值：在这里您需要找到新值的插入点并在该点之后移动所有元素，导致单次插入的最坏情况时间为 Ω(n) !

所有这些操作都是密集矩阵的常量时间操作，这主要解释了运行时差异。

我通常的解决方案是使用单独的稀疏矩阵类型进行组装（您通常会多次访问一个元素）基于坐标格式（存储三元组 (i, j , value)) ，它使用像 std::map 或 std::unordered_map 这样的映射来存储对应于矩阵中位置 (i,j) 的三元组索引。

this question about matrix assembly

中也讨论了一些类似的方法

我最近使用的示例：

class DynamicSparseMatrix {
    using Number = double;
    using Index = std::size_t;
    using Entry = std::pair<Index, Index>;
    std::vector<Number> values;
    std::vector<Index> rows;
    std::vector<Index> cols;
    std::map<Entry, Index> map; // unordered_map might be faster,
                                // but you need a suitable hash function
                                // like boost::hash<Entry> for this.
    Index num_rows;
    Index num_cols;

    ...

    Number& value(Index row, Index col) {
        // just to prevent misuse
        assert(row >= 0 && row < num_rows);
        assert(col >= 0 && col < num_cols);
        // Find the entry in the matrix
        Entry e{row, col};
        auto it = map.find(e);
        // If the entry hasn't previously been stored
        if (it == map.end()) {
            // Add a new entry by adding its value and coordinates
            // to the end of the storage vectors.
            it = map.insert(make_pair(e, values.size())).first;
            rows.push_back(row);
            cols.push_back(col);
            values.push_back(0);
        }
        // Return the value
        return values[(*it).second];
    }

    ...

};

组装后，您可以存储来自 rows、cols、values（实际上以坐标格式表示矩阵）的所有值，可能对它们进行排序并执行 batch insertion 到你的犰狳矩阵中。

Armadillo SpMat<int> 与 Mat<int> 相比非常慢

Armadillo SpMat<int> extremely slow compared to Mat<int>

c++

mex

sparse-matrix

armadillo

描述：

代码：

时间：

结果：

更新：