R:改变稀疏矩阵的对角线非常慢

R: Changing diagonal of sparse matrix is very slow

我有一个主对角线上有零的稀疏矩阵,我想将其更改为零,但与 QR 分解相比,该操作非常非常慢:

mat <- matrix(c(0,1,1,1,0,1,1,1,0),ncol=3)
mat1 <- Matrix::bdiag(mat,mat,mat)
mat2 <- Matrix::bdiag(mat,mat,mat)
identity_mat <- Matrix::Diagonal(9)

microbenchmark::microbenchmark(
  qr(mat1),
  Matrix::diag(mat2) <- 1,
  mat1 + identity_mat
)

结果

Unit: microseconds
                    expr      min        lq       mean    median        uq      max neval
                qr(mat1)   55.825   69.0080   79.16561   72.9365   85.6095  149.676   100
 Matrix::diag(mat2) <- 1  302.172  326.2365  379.60509  364.1985  401.8005  756.477   100
     mat1 + identity_mat 1714.578 1762.8665 2006.50270 1974.4125 2073.1795 6671.644   100

如何更快地将对角线设置为 1?

这个稍微快一点(它使用三元组稀疏矩阵而不是压缩矩阵)。

N <- 3
mat <- matrix(c(0,1,1,1,0,1,1,1,0),ncol=3)
mat1 <- do.call(Matrix::bdiag, replicate(N, mat, simplify = FALSE))
mat2 <- do.call(Matrix::bdiag, replicate(N, mat, simplify = FALSE))
mat3 <- Matrix::.bdiag(replicate(N, mat, simplify = FALSE))
identity_mat <- Matrix::Diagonal(3*N)

microbenchmark::microbenchmark(
  qr(mat1),
  Matrix::diag(mat2) <- 1,
  Matrix::diag(mat3) <- 1,
  mat1 + identity_mat
)

#> Unit: microseconds
#>                     expr      min        lq       mean    median        uq       max neval
#>                 qr(mat1)   50.519   65.8000   83.40258   74.1075   84.9095   451.866   100
#>  Matrix::diag(mat2) <- 1  266.200  318.6375  452.58706  338.8715  405.3270  5460.654   100
#>  Matrix::diag(mat3) <- 1  164.340  181.7700  246.14324  204.1055  235.4700  3083.771   100
#>      mat1 + identity_mat 1519.636 1739.8940 2297.10306 1863.0430 2251.7720 18617.782   100

对于更大的矩阵,这些几乎不需要更长的时间(下面是针对 N = 300 的)这让我想知道它是否只是使 S4 对象变慢(可能有很多验证正在进行背景)。

N <- 300

#> Unit: microseconds
#>                     expr        min         lq        mean      median          uq        max neval
#>                 qr(mat1) 239799.888 251484.867 260169.1626 257957.9940 265350.8880 321234.482   100
#>  Matrix::diag(mat2) <- 1    396.399    415.131    529.8535    495.5805    575.4920   2367.596   100
#>  Matrix::diag(mat3) <- 1    257.128    276.636    361.8436    322.2445    380.6375   2210.064   100
#>      mat1 + identity_mat   1605.454   1692.756   2176.5367   1833.2210   2000.9815  16803.231   100

如果您可以对您的矩阵做出假设,您也许可以破解它以更快地工作。特别是如果您要写入对角线的矩阵事先在对角线上没有条目(如您的示例所示),您可以这样做:

N <- 3
mat4 <- Matrix::.bdiag(replicate(N, mat, simplify = FALSE))
insert_diagonal <- function(m, d) {
  m@i <- c(m@i, 0:(d-1))
  m@j <- c(m@j, 0:(d-1))
  m@x <- c(m@x, rep(1, d))
  m
}
microbenchmark::microbenchmark(
  qr(mat1),
  Matrix::diag(mat2) <- 1,
  Matrix::diag(mat3) <- 1,
  insert_diagonal(mat4, 3*N)
)
#> Unit: microseconds
#>                          expr     min       lq      mean   median       uq     max neval
#>                      qr(mat1)  63.885  81.0315  97.14267  90.4660  99.4635 413.534   100
#>       Matrix::diag(mat2) <- 1 325.229 368.2320 417.94677 408.6095 425.9595 755.734   100
#>       Matrix::diag(mat3) <- 1 195.907 212.6790 266.83832 249.9585 266.0280 796.030   100
#>  insert_diagonal(mat4, 3 * N)  23.676  30.2365  35.59022  35.5075  39.2745  62.028   100

这是一个使用 RcppArmadillo 的解决方案:

Rcpp::cppFunction(code="
arma::sp_mat set_unit_diagonal(arma::sp_mat& A) {
      A.diag().ones();
    return A;
}", depends="RcppArmadillo")

mat <- matrix(c(0,1,1,1,0,1,1,1,0),ncol=3)
mat1 <- Matrix::bdiag(mat,mat,mat)
mat2 <- Matrix::bdiag(mat,mat,mat)
mat3 <- Matrix::bdiag(mat,mat,mat)
identity_mat <- Matrix::Diagonal(9)

解决方法非常快:

Unit: microseconds
                    expr      min        lq       mean   median        uq       max neval
                qr(mat1)   52.374   62.4805   69.12798   65.747   75.4455   110.630   100
 Matrix::diag(mat2) <- 1  272.026  289.9080  323.98992  300.557  358.5355   419.486   100
     mat1 + identity_mat 1543.191 1620.2835 1913.68513 1637.990 1970.8930 13572.716   100
 set_unit_diagonal(mat3)    7.426   11.9100   14.00686   13.970   15.5955    26.484   100

你可以试试

`diag<-`(as.matrix(mat1), 1)

和基准测试

microbenchmark::microbenchmark(
  qr(mat1),
  Matrix::diag(mat2) <- 1,
  `diag<-`(as.matrix(mat1), 1)
)

给予

Unit: microseconds
                         expr   min     lq    mean median     uq   max neval
                     qr(mat1)  52.4  56.55  61.457  60.85  64.05 103.5   100
      Matrix::diag(mat2) <- 1 269.2 275.35 290.202 282.75 297.65 443.4   100
 `diag<-`(as.matrix(mat1), 1)  38.6  42.40  47.721  46.10  48.90 147.4   100

更新

microbenchmark::microbenchmark(
  qr(mat1),
  Matrix::diag(mat2) <- 1,
  as(`diag<-`(as.matrix(mat1), 1), "sparseMatrix")
)

显示

Unit: microseconds
                                             expr   min     lq    mean median
                                         qr(mat1)  50.6  56.55  61.592   60.2
                          Matrix::diag(mat2) <- 1 270.6 277.20 290.993  280.3
 as(`diag<-`(as.matrix(mat1), 1), "sparseMatrix")  96.6 105.45 111.285  112.1
     uq   max neval
  64.00 179.5   100
 294.35 492.3   100
 115.25 136.2   100