沿行乘以矩阵以获得数组
Multiplication of matrices along rows to get an array
我有一个 2x4 矩阵 A 和一个 3x4 矩阵 B。我想得到一个维度为 (2, 3, 4) 的数组 C,其中 C 的 ijk
条目是 ik
A 的条目和 B 的 jk
条目。
在 R 中是否有避免循环的快速方法?下面的例子有两种方法来计算我正在寻找的东西——这两种方法都涉及循环
A <- matrix(1:8, 2, 4)
B <- matrix(11:22, 3, 4)
C1 <- array(NA, dim=c(2, 3, 4))
for (ii in 1:2) {
for (jj in 1:3) {
C1[ii, jj, ] <- A[ii, ] * B[jj, ]
}
}
C2 <- array(NA, dim=c(2, 3, 4))
for (ss in 1:4) {
C2[, , ss] <- outer(A[, ss], B[, ss])
}
改进不大,但使用 abind
:
C3 <- do.call(abind::abind, c(lapply(seq(ncol(A)), function(ss) outer(A[,ss], B[,ss])), along=3))
基准
注意:上次我 运行 这个基准测试是在另一台笔记本电脑 运行 R-4.0.5 上进行的,在那种情况下,运行 基准测试了好几次,C1
的表现与 C2
的表现相当。我换了笔记本电脑(出于其他原因),看到了@jay.sf 的回答并想将其添加到战斗中,现在 C1
明显更快了。我无法解释不同之处,但相关规格:
- 之前的基准测试:windows 10,R-4.0.5
- 此基准:windows11,R-4.1.2
在我看来,基准测试中的“动荡”是由于数据量小,受到管理开销的严重影响。如果矩阵大得多,我希望更好的突破(未经验证)。
bench::mark(
C1 = {
for (ii in 1 : 2) {
for (jj in 1 : 3) {
C1[ii, jj, ] <- A[ii, ] * B[jj, ]
}
}
as.numeric(C1)
},
C2 = {
for (ss in 1 : 4) {
C2[, , ss] <- outer(A[, ss], B[, ss])
}
as.numeric(C2)
},
C3 = as.numeric(do.call(abind::abind, c(lapply(seq(ncol(A)), function(ss) outer(A[,ss], B[,ss])), along=3))),
C4 = as.numeric(vapply(1:4, \(ss) outer(A[, ss], B[, ss]), matrix(0, 2, 3)))
)
# # A tibble: 4 x 13
# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
# 1 C1 16.6us 52.3us 18669. 240B 0 9317 0 499ms <dbl [24]> <Rprofmem [1 x 3]> <bench_tm [9,317]> <tibble [9,317 x 3]>
# 2 C2 25.8us 85.6us 11225. 240B 2.14 5235 1 466ms <dbl [24]> <Rprofmem [1 x 3]> <bench_tm [5,236]> <tibble [5,236 x 3]>
# 3 C3 763us 797.4us 1144. 720B 0 572 0 500ms <dbl [24]> <Rprofmem [3 x 3]> <bench_tm [572]> <tibble [572 x 3]>
# 4 C4 24.8us 36.7us 21752. 240B 2.18 9999 1 460ms <dbl [24]> <Rprofmem [1 x 3]> <bench_tm [10,000]> <tibble [10,000 x 3]>
(我向它们中的每一个添加了 as.numeric
因为一些 return 整数,一些双打,我不想 as.numeric
一个并使基准产生偏差。这不是严格的它们中的任何一个都需要,但现在我们可以确信它们都是相等的,否则 bench::mark
会失败,并抱怨输出不同。)
我喜欢@jay.sf的回答,因为它既快速又不需要额外的包(abind
是non-standard但是尽管如此还是很方便)。
基准,取2
让我们增加一点数据。
A2 <- do.call(cbind, replicate(50, do.call(rbind, replicate(50, A, simplify = FALSE)), simplify = FALSE))
Abig <- do.call(cbind, replicate(50, do.call(rbind, replicate(50, A, simplify = FALSE)), simplify = FALSE))
Bbig <- do.call(cbind, replicate(50, do.call(rbind, replicate(50, B, simplify = FALSE)), simplify = FALSE))
C1big <- C2big <- array(dim=c(dim(Abig)[1], dim(Bbig)))
dim(Abig)
# [1] 100 200
现在是不同的基准,现在是 windows 11,R-4.1.2:
bench::mark(
C1big = {
for (ii in seq(nrow(Abig))) {
for (jj in seq(nrow(Bbig))) {
C1big[ii, jj, ] <- Abig[ii, ] * Bbig[jj, ]
}
}
as.numeric(C1big)
},
C2big = {
for (ss in seq(ncol(Abig))) {
C2big[, , ss] <- outer(Abig[, ss], Bbig[, ss])
}
as.numeric(C2big)
},
C3big = as.numeric(do.call(abind::abind, c(lapply(seq(ncol(Abig)), function(ss) outer(Abig[,ss], Bbig[,ss])), along=3))),
C4big = as.numeric(vapply(seq(ncol(Abig)), function(ss) outer(Abig[, ss], Bbig[, ss]), matrix(0, nrow(Abig), nrow(Bbig)))),
iterations = 30
)
# # A tibble: 4 x 13
# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
# 1 C1big 100.7ms 135ms 7.38 83.5MB 2.68 22 8 2.98s <dbl [3,000,000]> <Rprofmem [75,001 x 3]> <bench_tm [30]> <tibble [30 x 3]>
# 2 C2big 22ms 24.4ms 36.0 46.8MB 7.20 25 5 694.7ms <dbl [3,000,000]> <Rprofmem [1,801 x 3]> <bench_tm [30]> <tibble [30 x 3]>
# 3 C3big 26.5ms 28.4ms 32.8 92.6MB 28.7 16 14 488.54ms <dbl [3,000,000]> <Rprofmem [1,632 x 3]> <bench_tm [30]> <tibble [30 x 3]>
# 4 C4big 10ms 17.4ms 61.2 46.7MB 18.6 23 7 375.9ms <dbl [3,000,000]> <Rprofmem [1,402 x 3]> <bench_tm [30]> <tibble [30 x 3]>
看来我们已经达到了我的预期:jay.sf 的 vapply
实施应该做得很好,而且它似乎正在与 `itr/sec`
和mem_alloc
是很好的指标。
vapply()
应该快了。
vapply(1:4, \(ss) outer(A[, ss], B[, ss]), matrix(0, 2, 3))
# , , 1
#
# [,1] [,2] [,3]
# [1,] 11 12 13
# [2,] 22 24 26
#
# , , 2
#
# [,1] [,2] [,3]
# [1,] 42 45 48
# [2,] 56 60 64
#
# , , 3
#
# [,1] [,2] [,3]
# [1,] 85 90 95
# [2,] 102 108 114
#
# , , 4
#
# [,1] [,2] [,3]
# [1,] 140 147 154
# [2,] 160 168 176
sapply
也是可以的,但是速度较慢(输出相同):
sapply(1:4, \(ss) outer(A[, ss], B[, ss]), simplify='array')
基准:
AA <- matrix(1:8, 1e3, 4e2)
BB <- matrix(11:22, 2e3, 4e2)
microbenchmark::microbenchmark(
`for1`={C1 <- array(NA, dim=c(1e3, 2e3, 4e2))
for (ii in 1:2) {
for (jj in 1:3) {
C1[ii, jj, ] <- AA[ii, ] * BB[jj, ]
}
}},
vapply=vapply(1:4, \(ss) outer(AA[, ss], BB[, ss]), matrix(0, 1e3, 2e3)),
abind=do.call(abind::abind, c(lapply(seq(ncol(AA)), function(ss) outer(AA[,ss], BB[,ss])), along=3)),
sapply=sapply(1:4, \(ss) outer(AA[, ss], BB[, ss]), simplify='array'),
times=3L, control=list(warmup=1e2L))
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# for1 3766.87037 3792.71469 3837.77384 3818.5590 3873.2256 3927.8921 3 b
# vapply 34.04493 68.17596 98.97098 102.3070 131.4340 160.5610 3 a
# abind 11736.37882 12063.20849 12320.85601 12390.0382 12613.0946 12836.1511 3 c
# sapply 58.41669 81.65372 139.44338 104.8907 179.9567 255.0227 3 a
我有一个 2x4 矩阵 A 和一个 3x4 矩阵 B。我想得到一个维度为 (2, 3, 4) 的数组 C,其中 C 的 ijk
条目是 ik
A 的条目和 B 的 jk
条目。
在 R 中是否有避免循环的快速方法?下面的例子有两种方法来计算我正在寻找的东西——这两种方法都涉及循环
A <- matrix(1:8, 2, 4)
B <- matrix(11:22, 3, 4)
C1 <- array(NA, dim=c(2, 3, 4))
for (ii in 1:2) {
for (jj in 1:3) {
C1[ii, jj, ] <- A[ii, ] * B[jj, ]
}
}
C2 <- array(NA, dim=c(2, 3, 4))
for (ss in 1:4) {
C2[, , ss] <- outer(A[, ss], B[, ss])
}
改进不大,但使用 abind
:
C3 <- do.call(abind::abind, c(lapply(seq(ncol(A)), function(ss) outer(A[,ss], B[,ss])), along=3))
基准
注意:上次我 运行 这个基准测试是在另一台笔记本电脑 运行 R-4.0.5 上进行的,在那种情况下,运行 基准测试了好几次,C1
的表现与 C2
的表现相当。我换了笔记本电脑(出于其他原因),看到了@jay.sf 的回答并想将其添加到战斗中,现在 C1
明显更快了。我无法解释不同之处,但相关规格:
- 之前的基准测试:windows 10,R-4.0.5
- 此基准:windows11,R-4.1.2
在我看来,基准测试中的“动荡”是由于数据量小,受到管理开销的严重影响。如果矩阵大得多,我希望更好的突破(未经验证)。
bench::mark(
C1 = {
for (ii in 1 : 2) {
for (jj in 1 : 3) {
C1[ii, jj, ] <- A[ii, ] * B[jj, ]
}
}
as.numeric(C1)
},
C2 = {
for (ss in 1 : 4) {
C2[, , ss] <- outer(A[, ss], B[, ss])
}
as.numeric(C2)
},
C3 = as.numeric(do.call(abind::abind, c(lapply(seq(ncol(A)), function(ss) outer(A[,ss], B[,ss])), along=3))),
C4 = as.numeric(vapply(1:4, \(ss) outer(A[, ss], B[, ss]), matrix(0, 2, 3)))
)
# # A tibble: 4 x 13
# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
# 1 C1 16.6us 52.3us 18669. 240B 0 9317 0 499ms <dbl [24]> <Rprofmem [1 x 3]> <bench_tm [9,317]> <tibble [9,317 x 3]>
# 2 C2 25.8us 85.6us 11225. 240B 2.14 5235 1 466ms <dbl [24]> <Rprofmem [1 x 3]> <bench_tm [5,236]> <tibble [5,236 x 3]>
# 3 C3 763us 797.4us 1144. 720B 0 572 0 500ms <dbl [24]> <Rprofmem [3 x 3]> <bench_tm [572]> <tibble [572 x 3]>
# 4 C4 24.8us 36.7us 21752. 240B 2.18 9999 1 460ms <dbl [24]> <Rprofmem [1 x 3]> <bench_tm [10,000]> <tibble [10,000 x 3]>
(我向它们中的每一个添加了 as.numeric
因为一些 return 整数,一些双打,我不想 as.numeric
一个并使基准产生偏差。这不是严格的它们中的任何一个都需要,但现在我们可以确信它们都是相等的,否则 bench::mark
会失败,并抱怨输出不同。)
我喜欢@jay.sf的回答,因为它既快速又不需要额外的包(abind
是non-standard但是尽管如此还是很方便)。
基准,取2
让我们增加一点数据。
A2 <- do.call(cbind, replicate(50, do.call(rbind, replicate(50, A, simplify = FALSE)), simplify = FALSE))
Abig <- do.call(cbind, replicate(50, do.call(rbind, replicate(50, A, simplify = FALSE)), simplify = FALSE))
Bbig <- do.call(cbind, replicate(50, do.call(rbind, replicate(50, B, simplify = FALSE)), simplify = FALSE))
C1big <- C2big <- array(dim=c(dim(Abig)[1], dim(Bbig)))
dim(Abig)
# [1] 100 200
现在是不同的基准,现在是 windows 11,R-4.1.2:
bench::mark(
C1big = {
for (ii in seq(nrow(Abig))) {
for (jj in seq(nrow(Bbig))) {
C1big[ii, jj, ] <- Abig[ii, ] * Bbig[jj, ]
}
}
as.numeric(C1big)
},
C2big = {
for (ss in seq(ncol(Abig))) {
C2big[, , ss] <- outer(Abig[, ss], Bbig[, ss])
}
as.numeric(C2big)
},
C3big = as.numeric(do.call(abind::abind, c(lapply(seq(ncol(Abig)), function(ss) outer(Abig[,ss], Bbig[,ss])), along=3))),
C4big = as.numeric(vapply(seq(ncol(Abig)), function(ss) outer(Abig[, ss], Bbig[, ss]), matrix(0, nrow(Abig), nrow(Bbig)))),
iterations = 30
)
# # A tibble: 4 x 13
# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
# 1 C1big 100.7ms 135ms 7.38 83.5MB 2.68 22 8 2.98s <dbl [3,000,000]> <Rprofmem [75,001 x 3]> <bench_tm [30]> <tibble [30 x 3]>
# 2 C2big 22ms 24.4ms 36.0 46.8MB 7.20 25 5 694.7ms <dbl [3,000,000]> <Rprofmem [1,801 x 3]> <bench_tm [30]> <tibble [30 x 3]>
# 3 C3big 26.5ms 28.4ms 32.8 92.6MB 28.7 16 14 488.54ms <dbl [3,000,000]> <Rprofmem [1,632 x 3]> <bench_tm [30]> <tibble [30 x 3]>
# 4 C4big 10ms 17.4ms 61.2 46.7MB 18.6 23 7 375.9ms <dbl [3,000,000]> <Rprofmem [1,402 x 3]> <bench_tm [30]> <tibble [30 x 3]>
看来我们已经达到了我的预期:jay.sf 的 vapply
实施应该做得很好,而且它似乎正在与 `itr/sec`
和mem_alloc
是很好的指标。
vapply()
应该快了。
vapply(1:4, \(ss) outer(A[, ss], B[, ss]), matrix(0, 2, 3))
# , , 1
#
# [,1] [,2] [,3]
# [1,] 11 12 13
# [2,] 22 24 26
#
# , , 2
#
# [,1] [,2] [,3]
# [1,] 42 45 48
# [2,] 56 60 64
#
# , , 3
#
# [,1] [,2] [,3]
# [1,] 85 90 95
# [2,] 102 108 114
#
# , , 4
#
# [,1] [,2] [,3]
# [1,] 140 147 154
# [2,] 160 168 176
sapply
也是可以的,但是速度较慢(输出相同):
sapply(1:4, \(ss) outer(A[, ss], B[, ss]), simplify='array')
基准:
AA <- matrix(1:8, 1e3, 4e2)
BB <- matrix(11:22, 2e3, 4e2)
microbenchmark::microbenchmark(
`for1`={C1 <- array(NA, dim=c(1e3, 2e3, 4e2))
for (ii in 1:2) {
for (jj in 1:3) {
C1[ii, jj, ] <- AA[ii, ] * BB[jj, ]
}
}},
vapply=vapply(1:4, \(ss) outer(AA[, ss], BB[, ss]), matrix(0, 1e3, 2e3)),
abind=do.call(abind::abind, c(lapply(seq(ncol(AA)), function(ss) outer(AA[,ss], BB[,ss])), along=3)),
sapply=sapply(1:4, \(ss) outer(AA[, ss], BB[, ss]), simplify='array'),
times=3L, control=list(warmup=1e2L))
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# for1 3766.87037 3792.71469 3837.77384 3818.5590 3873.2256 3927.8921 3 b
# vapply 34.04493 68.17596 98.97098 102.3070 131.4340 160.5610 3 a
# abind 11736.37882 12063.20849 12320.85601 12390.0382 12613.0946 12836.1511 3 c
# sapply 58.41669 81.65372 139.44338 104.8907 179.9567 255.0227 3 a