为什么 gtools::combinations 和排列不适用于包含相同元素的向量?

Why does gtools::combinations and permutations not work with a vector containing the same elements?

假设我有一个向量 vec <- c("H", "H", "H", "H", "M", "M", "A", "A") 如果我例如,我如何获得所有组合/排列从 8 个中抽取 5 个符合预期的输出。

> head(t, 6)
     [,1] [,2] [,3] [,4] [,5]
[1,] "H"  "H"  "H"  "H"  "M" 
[2,] "H"  "H"  "H"  "H"  "M" 
[3,] "H"  "H"  "H"  "H"  "A" 
[4,] "H"  "H"  "H"  "H"  "A" 
[5,] "H"  "H"  "H"  "M"  "M" 
[6,] "H"  "H"  "H"  "M"  "A" 

我尝试了 gtools::combinations(),但我总是收到错误消息,即不同元素太少(无论是否允许重复,gtools::permutations() 也是如此。 所以我费了好大劲才搞定的

t <- gtools::combinations(8, 5, vec, repeats.allowed = F)
Error in gtools::combinations(8, 5, vec, repeats.allowed = F) : 
  too few different elements


t <- gtools::combinations(8, 5, letters[1:8], repeats.allowed = F)

for ( i in 1:8) {
  if ( i <=4 ) {
    t[t == letters[i]] <- "H" 
  } else if (i <= 6) {
    t[t == letters[i]] <- "M" 
  } else if (i <= 8) {
    t[t == letters[i]] <- "A" 
  }
}

我正在从任何包或基础 R 中寻找更简单的解决方案,并且想知道为什么它不起作用。提前致谢。

另一种选择

combn(vec,5)

产生 56 种组合 (choose(8,5))。

apply(gtools::combinations(8,5,repeats.allowed = FALSE),2,\(x) vec[x]) 做你想做的事。 我不知道为什么如果将包应用到矢量上,包需要不同的值。文档中不清楚。

当您需要 combinations/permutations 包含重复的矢量或 multisets 时,基础 R 和其他包中的许多可用函数将产生最终需要的不必要的重复结果被过滤掉。对于较小的问题,这不是问题,但是这种方法很快就会变得不切实际。

目前,有几个包能够处理这些类型的问题。他们是arrangementsRcppAlgos(我是作者)。

vec <- c("H", "H", "H", "H", "M", "M", "A", "A")
tbl_v <- table(vec)

tbl_v
vec
A H M 
2 4 2 

library(RcppAlgos)
comboGeneral(names(tbl_v), 5, freqs = tbl_v)
    [,1] [,2] [,3] [,4] [,5]
[1,] "A"  "A"  "H"  "H"  "H" 
[2,] "A"  "A"  "H"  "H"  "M" 
[3,] "A"  "A"  "H"  "M"  "M" 
[4,] "A"  "H"  "H"  "H"  "H" 
[5,] "A"  "H"  "H"  "H"  "M" 
[6,] "A"  "H"  "H"  "M"  "M" 
[7,] "H"  "H"  "H"  "H"  "M" 
[8,] "H"  "H"  "H"  "M"  "M"

## For package arrangements we have:
## arrangements::combinations(names(tbl_v), 5, freq = tbl_v)

同样,对于排列,我们有:

permuteGeneral(names(tbl_v), 5, freqs = tbl_v)
      [,1] [,2] [,3] [,4] [,5]
  [1,] "A"  "A"  "H"  "H"  "H" 
  [2,] "A"  "A"  "H"  "H"  "M" 
  [3,] "A"  "A"  "H"  "M"  "H" 
  [4,] "A"  "A"  "H"  "M"  "M" 
     .   .    .    .    .    .
     .   .    .    .    .    .
     .   .    .    .    .    .
[137,] "M"  "M"  "H"  "A"  "A" 
[138,] "M"  "M"  "H"  "A"  "H" 
[139,] "M"  "M"  "H"  "H"  "A" 
[140,] "M"  "M"  "H"  "H"  "H" 

## For package arrangements we have:
## arrangements::permutations(names(tbl_v), 5, freq = tbl_v)

这两个包都包含无需过滤即可生成每个结果的算法。这种方法效率更高。

例如,如果我们有 big_vec <- rep(vec, 8) 并且我们想要所有长度为 16 的组合。使用过滤方法,需要生成长度为 64 的向量的所有组合,选择 16,然后过滤它们.即 choose(64, 16) = 4.885269e+14 总组合。这会很困难。

有了这两个包,这个问题就轻而易举了。

big_vec <- rep(vec, 8)
tbl_big_v <- table(big_vec)

tbl_big_v
big_vec
 A  H  M 
16 32 16 

system.time(test_big <- comboGeneral(names(tbl_big_v), 16,
                                     freqs = tbl_big_v))
user  system elapsed 
   0       0       0 

dim(test_big)
[1] 153  16