Return 具有一定数量的 1 和剩余零的给定维度的所有矩阵
Return all matrices of a given dimension with a certain number of ones and remaining zeros
考虑以下简化示例,其中所有可能的 2 x 2 矩阵都有一个 1,其余为 0。
library(arrangements)
# define function
generate_matrices <- function(nrow, ncol, ones_count) {
vectors <- permutations(c(
rep(1, ones_count),
rep(0, nrow * ncol - ones_count)
))
# remove redundancies
vectors <- vectors[!duplicated(vectors),]
# list of matrices
out <- list()
for (i in 1:ncol(vectors)) {
out[[i]] <- matrix(
data = vectors[,i],
nrow = nrow,
ncol = ncol,
byrow = TRUE
)
}
return(out)
}
运行函数用一个1生成所有2×2矩阵:
generate_matrices(nrow = 2, ncol = 2, ones_count = 1)
[[1]]
[,1] [,2]
[1,] 1 0
[2,] 0 0
[[2]]
[,1] [,2]
[1,] 0 1
[2,] 0 0
[[3]]
[,1] [,2]
[1,] 0 0
[2,] 1 0
[[4]]
[,1] [,2]
[1,] 0 0
[2,] 0 1
当我将其扩展为 5 行 4 列 4 的矩阵时,出现错误:
generate_matrices(nrow = 5, ncol = 4, ones_count = 4)
# Error in permutations(c(rep(1, ones_count), rep(0, nrow * ncol - ones_count))) :
# too many results
我的猜测是行
vectors <- permutations(c(
rep(1, ones_count),
rep(0, nrow * ncol - ones_count)
))
运行 and/or 花费的时间太长,我的笔记本电脑上没有足够的内存来存储这些。有没有更有效的方法来实现这个?
值得注意的是,我想最终将其扩展到 6 x 5 的情况下有 4 个,8 x 5 的情况下有 8 个。
你可以使用开发的函数得到所有可能的dim 5x4矩阵,然后使用sum
.
过滤1的个数
f = function(nrow, ncol) lapply(asplit(do.call(expand.grid, rep(list(0:1), nrow * ncol)), 1), matrix, nrow, ncol)
list = f(5,4)
list[lapply(list, sum) == 4]
您可以采用索引组合为 1:
m <- 2
n <- 2
k <- 2
createMatrix <- function(m, n, indices){
x <- matrix(0, m, n)
x[indices] <- 1
x
}
lapply(
combn(seq_len(m*n), k, simplify = FALSE),
function(x) createMatrix(m, n, x)
)
其中 m
是行数,n
是列数,k
是个数。
使用 partitions::multiset
并将结果转换为适当维度的数组似乎更有效。
f1 = function(nr, nc, n1){
m = unclass(multiset(c(rep(0, nr*nc - n1), rep(1, n1))))
`dim<-`(m, c(nr, nc, ncol(m)))
}
f1(nr = 2, nc = 2, n1 = 1)
# , , 1,
#
# [,1] [,2]
# [1,] 0 0
# [2,] 0 1
#
# , , 2
#
# [,1] [,2]
# [1,] 0 1
# [2,] 0 0
#
# , , 3
#
# [,1] [,2]
# [1,] 0 0
# [2,] 1 0
#
# , , 4
#
# [,1] [,2]
# [1,] 1 0
# [2,] 0 0
如果需要,可以轻松地将数组转换为列表:
asplit(a, MARGIN = 3)
基准
在更大的数据上,multiset
比 快得多(后者需要为每个组合调用 matrix
和 [<-
)。这里用 6 个 1 对 8*5 矩阵进行计时,得到 3 838 380 个矩阵:
f2 = function(m, n, k){lapply(
combn(seq_len(m*n), k, simplify = FALSE),
function(x) createMatrix(m, n, x))}
microbenchmark(
f1(nr = 8, nc = 5, n1 = 6),
f2(m = 8, n = 5, k = 6),
times = 10L)
# Unit: milliseconds
# expr min lq mean median uq max neval
# f1(nr = 8, nc = 5, n1 = 6) 582.5020 680.5886 916.1864 802.1724 1531.4137 3132.456 10
# f2(m = 8, n = 5, k = 6) 20539.4030 22039.6975 24097.4683 24022.0033 1166.9455 2544.132 10
dim(f1(nr = 8, nc = 5, n1 = 6))
# [1] 8 5 3838380
length(f2(m = 8, n = 5, k = 6))
# [1] 3838380
使用上面的输入,不幸的是,Maël 的代码在我的 PC 上出错(“无法分配大小的向量...”),可能是由于“expand.grid
爆炸”。
这是使用包 RcppAlgos
的单行代码(我是作者):
library(RcppAlgos)
nr = 2
nc = 2
n1 = 1
permuteGeneral(0:1, freqs = c(nr * nc - n1, n1),
FUN = function(x) matrix(x, nc))
[[1]]
[,1] [,2]
[1,] 0 0
[2,] 0 1
[[2]]
[,1] [,2]
[1,] 0 1
[2,] 0 0
[[3]]
[,1] [,2]
[1,] 0 0
[2,] 1 0
[[4]]
[,1] [,2]
[1,] 1 0
[2,] 0 0
这个包还通过 permuteIter
提供了非常灵活的迭代器。例如:
iter <- permuteIter(0:1, freqs = c(nr * nc - n1, n1),
FUN = function(x) matrix(x, nc))
iter$nextIter()
[,1] [,2]
[1,] 0 0
[2,] 0 1
iter$back() ## Get the last one (or the first one via front())
[,1] [,2]
[1,] 1 0
[2,] 0 0
iter[[3]] ## Random access via [[
[,1] [,2]
[1,] 0 0
[2,] 1 0
iter[[c(1, 4)]]
[[1]]
[,1] [,2]
[1,] 0 0
[2,] 0 1
[[2]]
[,1] [,2]
[1,] 1 0
[2,] 0 0
iter$startOver() ## Reset the iterator
iter$nextNIter(3) ## Get the next n
[[1]]
[,1] [,2]
[1,] 0 0
[2,] 0 1
[[2]]
[,1] [,2]
[1,] 0 1
[2,] 0 0
[[3]]
[,1] [,2]
[1,] 0 0
[2,] 1 0
iter$prevIter() ## iterate backwards
[,1] [,2]
[1,] 0 1
[2,] 0 0
包是用C++
写的,一般效率很高,但是使用FUN
参数纯粹是为了方便。
如果您正在寻找原始速度,最好使用@Henrik 提供的方法。即,将所有排列生成为矩阵并将其转换为数组。请注意维度的顺序不同。之所以如此,是因为 partitions::multiset
生成列优先对象,而 RcppAlgos::permuteGeneral
生成行优先矩阵。生成的数组是同构的。
fasterApproach <- function(nr, nc, n1) {
arr <- permuteGeneral(0:1, freqs = c(nr * nc - n1, n1))
dim(arr) <- c(nrow(arr), nr, nc)
arr
}
dim(f1(6, 5, 5))
[1] 6 5 142506
dim(fasterApproach(6, 5, 5))
[1] 142506 6 5
microbenchmark(f1(6, 5, 5), fasterApproach(6, 5, 5))
Unit: milliseconds
expr min lq mean median uq max neval cld
f1(6, 5, 5) 14.240662 22.09967 34.19180 24.15314 28.84547 125.2753 100 b
fasterApproach(6, 5, 5) 9.006603 10.15762 20.41521 15.87181 18.20326 115.5324 100 a
考虑以下简化示例,其中所有可能的 2 x 2 矩阵都有一个 1,其余为 0。
library(arrangements)
# define function
generate_matrices <- function(nrow, ncol, ones_count) {
vectors <- permutations(c(
rep(1, ones_count),
rep(0, nrow * ncol - ones_count)
))
# remove redundancies
vectors <- vectors[!duplicated(vectors),]
# list of matrices
out <- list()
for (i in 1:ncol(vectors)) {
out[[i]] <- matrix(
data = vectors[,i],
nrow = nrow,
ncol = ncol,
byrow = TRUE
)
}
return(out)
}
运行函数用一个1生成所有2×2矩阵:
generate_matrices(nrow = 2, ncol = 2, ones_count = 1)
[[1]]
[,1] [,2]
[1,] 1 0
[2,] 0 0
[[2]]
[,1] [,2]
[1,] 0 1
[2,] 0 0
[[3]]
[,1] [,2]
[1,] 0 0
[2,] 1 0
[[4]]
[,1] [,2]
[1,] 0 0
[2,] 0 1
当我将其扩展为 5 行 4 列 4 的矩阵时,出现错误:
generate_matrices(nrow = 5, ncol = 4, ones_count = 4)
# Error in permutations(c(rep(1, ones_count), rep(0, nrow * ncol - ones_count))) :
# too many results
我的猜测是行
vectors <- permutations(c(
rep(1, ones_count),
rep(0, nrow * ncol - ones_count)
))
运行 and/or 花费的时间太长,我的笔记本电脑上没有足够的内存来存储这些。有没有更有效的方法来实现这个?
值得注意的是,我想最终将其扩展到 6 x 5 的情况下有 4 个,8 x 5 的情况下有 8 个。
你可以使用sum
.
f = function(nrow, ncol) lapply(asplit(do.call(expand.grid, rep(list(0:1), nrow * ncol)), 1), matrix, nrow, ncol)
list = f(5,4)
list[lapply(list, sum) == 4]
您可以采用索引组合为 1:
m <- 2
n <- 2
k <- 2
createMatrix <- function(m, n, indices){
x <- matrix(0, m, n)
x[indices] <- 1
x
}
lapply(
combn(seq_len(m*n), k, simplify = FALSE),
function(x) createMatrix(m, n, x)
)
其中 m
是行数,n
是列数,k
是个数。
使用 partitions::multiset
并将结果转换为适当维度的数组似乎更有效。
f1 = function(nr, nc, n1){
m = unclass(multiset(c(rep(0, nr*nc - n1), rep(1, n1))))
`dim<-`(m, c(nr, nc, ncol(m)))
}
f1(nr = 2, nc = 2, n1 = 1)
# , , 1,
#
# [,1] [,2]
# [1,] 0 0
# [2,] 0 1
#
# , , 2
#
# [,1] [,2]
# [1,] 0 1
# [2,] 0 0
#
# , , 3
#
# [,1] [,2]
# [1,] 0 0
# [2,] 1 0
#
# , , 4
#
# [,1] [,2]
# [1,] 1 0
# [2,] 0 0
如果需要,可以轻松地将数组转换为列表:
asplit(a, MARGIN = 3)
基准
在更大的数据上,multiset
比 matrix
和 [<-
)。这里用 6 个 1 对 8*5 矩阵进行计时,得到 3 838 380 个矩阵:
f2 = function(m, n, k){lapply(
combn(seq_len(m*n), k, simplify = FALSE),
function(x) createMatrix(m, n, x))}
microbenchmark(
f1(nr = 8, nc = 5, n1 = 6),
f2(m = 8, n = 5, k = 6),
times = 10L)
# Unit: milliseconds
# expr min lq mean median uq max neval
# f1(nr = 8, nc = 5, n1 = 6) 582.5020 680.5886 916.1864 802.1724 1531.4137 3132.456 10
# f2(m = 8, n = 5, k = 6) 20539.4030 22039.6975 24097.4683 24022.0033 1166.9455 2544.132 10
dim(f1(nr = 8, nc = 5, n1 = 6))
# [1] 8 5 3838380
length(f2(m = 8, n = 5, k = 6))
# [1] 3838380
使用上面的输入,不幸的是,Maël 的代码在我的 PC 上出错(“无法分配大小的向量...”),可能是由于“expand.grid
爆炸”。
这是使用包 RcppAlgos
的单行代码(我是作者):
library(RcppAlgos)
nr = 2
nc = 2
n1 = 1
permuteGeneral(0:1, freqs = c(nr * nc - n1, n1),
FUN = function(x) matrix(x, nc))
[[1]]
[,1] [,2]
[1,] 0 0
[2,] 0 1
[[2]]
[,1] [,2]
[1,] 0 1
[2,] 0 0
[[3]]
[,1] [,2]
[1,] 0 0
[2,] 1 0
[[4]]
[,1] [,2]
[1,] 1 0
[2,] 0 0
这个包还通过 permuteIter
提供了非常灵活的迭代器。例如:
iter <- permuteIter(0:1, freqs = c(nr * nc - n1, n1),
FUN = function(x) matrix(x, nc))
iter$nextIter()
[,1] [,2]
[1,] 0 0
[2,] 0 1
iter$back() ## Get the last one (or the first one via front())
[,1] [,2]
[1,] 1 0
[2,] 0 0
iter[[3]] ## Random access via [[
[,1] [,2]
[1,] 0 0
[2,] 1 0
iter[[c(1, 4)]]
[[1]]
[,1] [,2]
[1,] 0 0
[2,] 0 1
[[2]]
[,1] [,2]
[1,] 1 0
[2,] 0 0
iter$startOver() ## Reset the iterator
iter$nextNIter(3) ## Get the next n
[[1]]
[,1] [,2]
[1,] 0 0
[2,] 0 1
[[2]]
[,1] [,2]
[1,] 0 1
[2,] 0 0
[[3]]
[,1] [,2]
[1,] 0 0
[2,] 1 0
iter$prevIter() ## iterate backwards
[,1] [,2]
[1,] 0 1
[2,] 0 0
包是用C++
写的,一般效率很高,但是使用FUN
参数纯粹是为了方便。
如果您正在寻找原始速度,最好使用@Henrik 提供的方法。即,将所有排列生成为矩阵并将其转换为数组。请注意维度的顺序不同。之所以如此,是因为 partitions::multiset
生成列优先对象,而 RcppAlgos::permuteGeneral
生成行优先矩阵。生成的数组是同构的。
fasterApproach <- function(nr, nc, n1) {
arr <- permuteGeneral(0:1, freqs = c(nr * nc - n1, n1))
dim(arr) <- c(nrow(arr), nr, nc)
arr
}
dim(f1(6, 5, 5))
[1] 6 5 142506
dim(fasterApproach(6, 5, 5))
[1] 142506 6 5
microbenchmark(f1(6, 5, 5), fasterApproach(6, 5, 5))
Unit: milliseconds
expr min lq mean median uq max neval cld
f1(6, 5, 5) 14.240662 22.09967 34.19180 24.15314 28.84547 125.2753 100 b
fasterApproach(6, 5, 5) 9.006603 10.15762 20.41521 15.87181 18.20326 115.5324 100 a