Eigen 中填充稀疏矩阵的速度取决于节点数或边数？

Question

我填充了两个网络的边缘。

一个是大约4000个节点和80000条边。

另一个大约是 80000 个节点和 1300000 个边。

代码如下：

SparseMatrix<int,Eigen::RowMajor> mat(nodenumber,nodenumber); //nodenumber is 4000 or 80000
mat.reserve(VectorXi::Constant(nodenumber,50)); //preserve 50 non-nero elements
for (i,j) in edges:
    mat.insert(i,j) = 1;
    mat.insert(j,i) = 1;
}

（4000 个节点，80000 条边）用 1.5 秒完成。

（80000 个节点，1300000 条边）用 600 秒完成。

但我认为填充矩阵的速度应该取决于边

（80000 个节点，1300000 个边）网络应该是 1.5*1300000/80000。

我是对还是错？

如何提高填充矩阵的速度？

谢谢！

Answer 1

看到这一行：mat.reserve(VectorXi::Constant(nodenumber,50));和documentation of Eigen on sparse matrix的这一点：

Note that when calling reserve(), it is not required that nnz is the exact number of nonzero elements in the final matrix. However, an exact estimation will avoid multiple reallocations during the insertion phase.

因此，考虑将 50 更改为大于边数 的值，以减少重复分配。尽管如此，它只会稍微减少挂钟时间，如 Filling a sparse matrix

部分所述

Because of the special storage scheme of a SparseMatrix, special care has to be taken when adding new nonzero entries. For instance, the cost of a single purely random insertion into a SparseMatrix is O(nnz), where nnz is the current number of non-zero coefficients.

因此，通过随机插入填充整个矩阵是 O(nnz^2/2)。实际上，如果您计算 80000^2 和 1300000^2，比率将与 1.5/600 相差不远，这些数字是您报告的执行时间。

为了节省时间，您可能对batch insertion感兴趣，即一次插入所有边。阅读 Eigen 文档的这一部分：它真的很值得！实际上，此网页上提供的这段代码可能会对您有所帮助。

typedef Eigen::Triplet<double> T;
std::vector<T> tripletList;
tripletList.reserve(nnz);
for(...)
{
    // ...
    tripletList.push_back(T(i,j,v_ij));
}
SparseMatrixType mat(rows,cols);
mat.setFromTriplets(tripletList.begin(), tripletList.end());

作为替代方案，您还可以为每列保留存储空间 space，如果您知道每列非空元素的最大数量并且不是太大的话：

mat.reserve(VectorXi::Constant(cols,6));

Eigen 中填充稀疏矩阵的速度取决于节点数或边数？

The speed of filling sparse matrix in Eigen depends on number of nodes or edges?

c++

arrays

sparse-matrix

eigen