将具有 n 个重复元素的字符串拆分为 n 个子字符串

Split string with n repetitive elements into n sub-strings

我有一个字符串,它是 m 种可能类型的元素的串联 - 为了简单起见,m = 4 与 A、B、C 和 D。

每当单个元素出现不止一次时,我就必须拆分字符串,这样就没有重复项了。但是,我想生成所有可能的字符串而不重复。

为了更清楚一点,这里有一个例子: 对于 A B A C D

  1. 字符串:A B C D
  2. 字符串:B A C D

当多个不同的元素出现不止一次时,这会变得更加复杂: 对于 A B A C B D

  1. 字符串:A B C D
  2. 字符串:A C B D
  3. 字符串:B A C D
  4. 字符串:A C B D

在 R 中有没有聪明的方法来计算这个?

vec <- c("A","B","A","C","B","D")
combs <- lapply(setNames(nm = unique(vec)), function(a) which(vec == a))
eg <- do.call(expand.grid, combs)
out <- t(apply(eg, 1, function(r) names(eg)[order(r)]))
#      [,1] [,2] [,3] [,4]
# [1,] "A"  "B"  "C"  "D" 
# [2,] "B"  "A"  "C"  "D" 
# [3,] "A"  "C"  "B"  "D" 
# [4,] "A"  "C"  "B"  "D" 
out

第一个向量:

vec <- c("A","B","A","C","D")
# ...

#      [,1] [,2] [,3] [,4]
# [1,] "A"  "B"  "C"  "D" 
# [2,] "B"  "A"  "C"  "D" 

如果您以字符串副向量开始和结束,那么知道您可以将上面的内容包装为:

strsplit("ABACBD", "")[[1]]
# [1] "A" "B" "A" "C" "B" "D"
apply(out, 1, paste, collapse = "")
# [1] "ABCD" "BACD" "ACBD" "ACBD"