将 R 中稀疏矩阵的对角线归零的内存有效方法

Question

我想将 R 中稀疏矩阵的对角线归零。我的蛮力方法是明确地将其设置为零，但这似乎效率不高。有没有更有效的方法？

require(Matrix)
A <- as(rsparsematrix(nrow = 1e7, ncol = 1e7, nnz = 1e4), "sparseMatrix")
diag(A) <- 0
A <- drop0(A)  # cleaning up

澄清和解决： 我最初担心的是 Matrix 会膨胀稀疏矩阵，在对角线上有实际的零。事实并非如此（最后，尽管在过渡期间，请参阅下面的评论）。要看到这一点，请考虑如果我们将对角线设置为 1 会发生什么：

A <- as(rsparsematrix(nrow = 1e7, ncol = 1e7, nnz = 1e4), "sparseMatrix")
format(object.size(A), units = "Mb")

[1] "38.3 Mb"

diag(A) <- 1
format(object.size(A), units = "Mb")

[1] "152.7 Mb"

我们添加的许多非零元素用完了 O(n) 内存，其中 n 是矩阵的维度。但是，使用 diag(A) <- 0 我们得到：

diag(A) <- 1
format(object.size(A), units = "Mb")

[1] "38.3 Mb"

也就是说，Matrix 已经有效地处理了这种情况。

Answer 1

您可以非常快速地找到非零条目：

ij <- which(A != 0, arr.ind = TRUE)

# Subset to those on the diagonal:

ij <- ij[ij[,1] == ij[,2],,drop = FALSE]

# And set those entries to zero:

A[ij] <- 0

编辑添加：

正如对原问题的修改所说，这最终并没有节省多少内存，但速度更快。 diag(A) <- 0 语句在我的计算机上大约需要 3.2 秒，而这 3 个步骤大约需要 0.2 秒。计时方法如下：

library(microbenchmark)
microbenchmark(A <- as(rsparsematrix(nrow = 1e7, ncol = 1e7, nnz = 1e4), "sparseMatrix"),
{A <- as(rsparsematrix(nrow = 1e7, ncol = 1e7, nnz = 1e4), "sparseMatrix"); diag(A) <- 0},
{A <- as(rsparsematrix(nrow = 1e7, ncol = 1e7, nnz = 1e4), "sparseMatrix");ij <- which(A != 0, arr.ind = TRUE);ij <- ij[ij[,1] == ij[,2],,drop = FALSE];A[ij] <- 0}, times = 10)

当我运行它时，我看到矩阵创建的中值时间为 137 毫秒，没有其他内容，创建加上 diag(A) 调用需要 3351 毫秒，创建后跟我的代码需要 319 毫秒.

它还在中间步骤中节省了大量内存，使用内存分析可以看出这一点：Rprof(memory=TRUE); run code ; Rprof(NULL); summaryRprof()。

将 R 中稀疏矩阵的对角线归零的内存有效方法

Memory efficient way to zero out the diagonal of a sparse matrix in R

r

sparse-matrix