R：从长度为 n 的向量中取出 2 个长度为 n 的随机非重叠样本（对于相同的索引）

Question

假设我有一个名为 all_combinations 的向量，其数字从 1 到 20。

我需要提取 2 个向量（coding_1 和 coding_2），长度等于 number_of_peptide_clusters，在我当前的情况下恰好也是 20。

应从 all_combinations 中随机抽取 2 个新向量，以便在每个索引位置不重叠。

我执行以下操作：

set.seed(3)
all_combinations=1:20
number_of_peptide_clusters=20
coding_1 <- sample(all_combinations, number_of_peptide_clusters, replace = FALSE)
coding_1
 [1]  5 12  7  4 10  8 11 15 17 16 18 13  9 20  2 14 19  1  3  6
coding_2 <- sample(all_combinations, number_of_peptide_clusters, replace = FALSE)
coding_2
 [1]  5  9 19 16 18 12  8  6 15  3 13 14  7  2 11 20 10  4 17  1

这个例子给我带来了麻烦，因为只有一个数字在同一索引处重叠（5 在位置 1）。

在这些情况下我会做的是找出重叠的数字并从所有重叠数字的列表中重新采样...

假设 coding_1 和 coding_2 是：

coding_1
 [1]  5 9  7  4 10  8 11 15 17 16 18 13  12 20 2  14 19  1  3  6
coding_2
 [1]  5 9 19 16 18 12  8  6 15  3 13 14  7  2  11 20 10  4 17  1

在这种情况下，我会在同一位置重叠 5 和 9，所以我会在 coding_2 中从重叠的完整列表中对它们进行重新采样 [从 c(5,9) 重新采样索引 1 所以不等于 5，索引 2 不等于 9]。所以 coding_2 将是：

coding_2
 [1]  9 5 19 16 18 12  8  6 15  3 13 14  7  2  11 20 10  4 17  1

但是，在上面的特殊情况下，我不能使用这种方法...那么从长度为 20 的向量中获取 2 个长度为 20 的样本的最佳方法是什么，这样样本就不是' t 在相同索引位置重叠？

如果我能获得第二个样本coding_2已经知道coding_1就太好了……否则同时获得2也是可以接受的，如果它使事情变得更容易的话。谢谢！

Answer 1

我认为最好的解决方案就是使用拒绝策略：

set.seed(3)
all_combinations <- 1:20
number_of_peptide_clusters <- 20
count <- 0
repeat {
  count <- count + 1
  message("Try number ", count)
  coding_1 <- sample(all_combinations, number_of_peptide_clusters, replace = FALSE)
  coding_2 <- sample(all_combinations, number_of_peptide_clusters, replace = FALSE)
  if (!any(coding_1 == coding_2))
    break
}
#> Try number 1
#> Try number 2
#> Try number 3
#> Try number 4
#> Try number 5
#> Try number 6
#> Try number 7
#> Try number 8
#> Try number 9
coding_1
#>  [1] 18 16 17 12 13  8  6 15  3  5 20  9 11  4 19  2 14  7  1 10
coding_2
#>  [1]  5 20 14  2 11  6  7 10 19  8  4  1 15  9 13 17 18 16 12  3

^{由 reprex package (v0.3.0)}

于 2020-11-04 创建

R：从长度为 n 的向量中取出 2 个长度为 n 的随机非重叠样本（对于相同的索引）

R: take 2 random non-overlapping samples (for same indexes) of length n out of vector of length n as well

replace

r

sample

overlap