沿行乘以矩阵以获得数组

Multiplication of matrices along rows to get an array

我有一个 2x4 矩阵 A 和一个 3x4 矩阵 B。我想得到一个维度为 (2, 3, 4) 的数组 C,其中 C 的 ijk 条目是 ik A 的条目和 B 的 jk 条目。

在 R 中是否有避免循环的快速方法?下面的例子有两种方法来计算我正在寻找的东西——这两种方法都涉及循环

A <- matrix(1:8, 2, 4)
B <- matrix(11:22, 3, 4)

C1 <- array(NA, dim=c(2, 3, 4))
for (ii in 1:2) {
  for (jj in 1:3) {
    C1[ii, jj, ] <- A[ii, ] * B[jj, ]
  }
}

C2 <- array(NA, dim=c(2, 3, 4))
for (ss in 1:4) {
  C2[, , ss] <- outer(A[, ss], B[, ss])
}

改进不大,但使用 abind:

C3 <- do.call(abind::abind, c(lapply(seq(ncol(A)), function(ss) outer(A[,ss], B[,ss])), along=3))

基准

注意:上次我 运行 这个基准测试是在另一台笔记本电脑 运行 R-4.0.5 上进行的,在那种情况下,运行 基准测试了好几次,C1 的表现与 C2 的表现相当。我换了笔记本电脑(出于其他原因),看到了@jay.sf 的回答并想将其添加到战斗中,现在 C1 明显更快了。我无法解释不同之处,但相关规格:

  • 之前的基准测试:windows 10,R-4.0.5
  • 此基准:windows11,R-4.1.2

在我看来,基准测试中的“动荡”是由于数据量小,受到管理开销的严重影响。如果矩阵大得多,我希望更好的突破(未经验证)。

bench::mark(
  C1 = {
  for (ii in 1 : 2) {
    for (jj in 1 : 3) {
      C1[ii, jj, ] <- A[ii, ] * B[jj, ]
    }
  }
  as.numeric(C1)
},
  C2 = {
  for (ss in 1 : 4) {
    C2[, , ss] <- outer(A[, ss], B[, ss])
  }
  as.numeric(C2)
},
  C3 = as.numeric(do.call(abind::abind, c(lapply(seq(ncol(A)), function(ss) outer(A[,ss], B[,ss])), along=3))),
  C4 = as.numeric(vapply(1:4, \(ss) outer(A[, ss], B[, ss]), matrix(0, 2, 3)))
)
# # A tibble: 4 x 13
#   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result     memory             time                gc                   
#   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>     <list>             <list>              <list>               
# 1 C1           16.6us   52.3us    18669.      240B     0     9317     0      499ms <dbl [24]> <Rprofmem [1 x 3]> <bench_tm [9,317]>  <tibble [9,317 x 3]> 
# 2 C2           25.8us   85.6us    11225.      240B     2.14  5235     1      466ms <dbl [24]> <Rprofmem [1 x 3]> <bench_tm [5,236]>  <tibble [5,236 x 3]> 
# 3 C3            763us  797.4us     1144.      720B     0      572     0      500ms <dbl [24]> <Rprofmem [3 x 3]> <bench_tm [572]>    <tibble [572 x 3]>   
# 4 C4           24.8us   36.7us    21752.      240B     2.18  9999     1      460ms <dbl [24]> <Rprofmem [1 x 3]> <bench_tm [10,000]> <tibble [10,000 x 3]>

(我向它们中的每一个添加了 as.numeric 因为一些 return 整数,一些双打,我不想 as.numeric 一个并使基准产生偏差。这不是严格的它们中的任何一个都需要,但现在我们可以确信它们都是相等的,否则 bench::mark 会失败,并抱怨输出不同。)

我喜欢@jay.sf的回答,因为它既快速又不需要额外的包(abind是non-standard但是尽管如此还是很方便)。


基准,取2

让我们增加一点数据。

A2 <- do.call(cbind, replicate(50, do.call(rbind, replicate(50, A, simplify = FALSE)), simplify = FALSE))
Abig <- do.call(cbind, replicate(50, do.call(rbind, replicate(50, A, simplify = FALSE)), simplify = FALSE))
Bbig <- do.call(cbind, replicate(50, do.call(rbind, replicate(50, B, simplify = FALSE)), simplify = FALSE))
C1big <- C2big <- array(dim=c(dim(Abig)[1], dim(Bbig)))
dim(Abig)
# [1] 100 200

现在是不同的基准,现在是 windows 11,R-4.1.2:

bench::mark(
  C1big = {
  for (ii in seq(nrow(Abig))) {
    for (jj in seq(nrow(Bbig))) {
      C1big[ii, jj, ] <- Abig[ii, ] * Bbig[jj, ]
    }
  }
  as.numeric(C1big)
},
  C2big = {
  for (ss in seq(ncol(Abig))) {
    C2big[, , ss] <- outer(Abig[, ss], Bbig[, ss])
  }
  as.numeric(C2big)
},
  C3big = as.numeric(do.call(abind::abind, c(lapply(seq(ncol(Abig)), function(ss) outer(Abig[,ss], Bbig[,ss])), along=3))),
  C4big = as.numeric(vapply(seq(ncol(Abig)), function(ss) outer(Abig[, ss], Bbig[, ss]), matrix(0, nrow(Abig), nrow(Bbig)))),
  iterations = 30
)
# # A tibble: 4 x 13
#   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result            memory                  time            gc               
#   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>            <list>                  <list>          <list>           
# 1 C1big       100.7ms    135ms      7.38    83.5MB     2.68    22     8      2.98s <dbl [3,000,000]> <Rprofmem [75,001 x 3]> <bench_tm [30]> <tibble [30 x 3]>
# 2 C2big          22ms   24.4ms     36.0     46.8MB     7.20    25     5    694.7ms <dbl [3,000,000]> <Rprofmem [1,801 x 3]>  <bench_tm [30]> <tibble [30 x 3]>
# 3 C3big        26.5ms   28.4ms     32.8     92.6MB    28.7     16    14   488.54ms <dbl [3,000,000]> <Rprofmem [1,632 x 3]>  <bench_tm [30]> <tibble [30 x 3]>
# 4 C4big          10ms   17.4ms     61.2     46.7MB    18.6     23     7    375.9ms <dbl [3,000,000]> <Rprofmem [1,402 x 3]>  <bench_tm [30]> <tibble [30 x 3]>

看来我们已经达到了我的预期:jay.sf 的 vapply 实施应该做得很好,而且它似乎正在与 `itr/sec`mem_alloc 是很好的指标。

vapply()应该快了。

vapply(1:4, \(ss) outer(A[, ss], B[, ss]), matrix(0, 2, 3))
# , , 1
# 
# [,1] [,2] [,3]
# [1,]   11   12   13
# [2,]   22   24   26
# 
# , , 2
# 
# [,1] [,2] [,3]
# [1,]   42   45   48
# [2,]   56   60   64
# 
# , , 3
# 
# [,1] [,2] [,3]
# [1,]   85   90   95
# [2,]  102  108  114
# 
# , , 4
# 
# [,1] [,2] [,3]
# [1,]  140  147  154
# [2,]  160  168  176

sapply也是可以的,但是速度较慢(输出相同):

sapply(1:4, \(ss) outer(A[, ss], B[, ss]), simplify='array')

基准:

AA <- matrix(1:8, 1e3, 4e2)
BB <- matrix(11:22, 2e3, 4e2)
microbenchmark::microbenchmark(
  `for1`={C1 <- array(NA, dim=c(1e3, 2e3, 4e2))
  for (ii in 1:2) {
    for (jj in 1:3) {
      C1[ii, jj, ] <- AA[ii, ] * BB[jj, ]
    }
  }},
  vapply=vapply(1:4, \(ss) outer(AA[, ss], BB[, ss]), matrix(0, 1e3, 2e3)),
  abind=do.call(abind::abind, c(lapply(seq(ncol(AA)), function(ss) outer(AA[,ss], BB[,ss])), along=3)),
  sapply=sapply(1:4, \(ss) outer(AA[, ss], BB[, ss]), simplify='array'),
  times=3L, control=list(warmup=1e2L))
# Unit: milliseconds
#   expr         min          lq        mean     median         uq        max neval cld
#   for1  3766.87037  3792.71469  3837.77384  3818.5590  3873.2256  3927.8921     3  b 
# vapply    34.04493    68.17596    98.97098   102.3070   131.4340   160.5610     3 a  
#  abind 11736.37882 12063.20849 12320.85601 12390.0382 12613.0946 12836.1511     3   c
# sapply    58.41669    81.65372   139.44338   104.8907   179.9567   255.0227     3 a