R保持随机生成数字，直到指定范围内的所有数字都存在

Question

我的目标是使用 R 随机生成一个整数向量，该向量由 1-8 之间的数字填充。但是，我想继续增加向量，直到 1:8 之间的所有数字至少表示一次，例如1,4,6,2,2,3,5,1,4,7,6,8.

我可以使用 sample

生成单个数字或数字序列

x=sample(1:8,1, replace=T)
>x
[1] 6

我试过 repeat 函数，看看它如何与 sample 一起工作，我至少可以让生成在出现一个特定数字时停止，例如

repeat {
   print(x)
   x = sample(1:8, 1, replace=T)
   if (x == 3){
       break
   }
}

给出：

[1] 3
[1] 6
[1] 6
[1] 6
[1] 6
[1] 6
[1] 2

我现在正在努力研究如何在出现 1:8 之间的所有数字后停止数字生成。此外，我知道上面的代码只是打印生成的序列，而不是将其存储为向量。任何指向我正确方向的建议都将不胜感激！

Answer 1

这对 1:8 来说很好，但可能并不总是一个好主意。

foo = integer(0)    
set.seed(42)
while(TRUE){
    foo = c(foo, sample(1:8, 1))
    if(all(1:8 %in% foo)) break
}
foo
# [1] 8 8 3 7 6 5 6 2 6 6 4 6 8 3 4 8 8 1

如果您有超过 1:8，最好是 obtain the average number of tries (N) required to get all the numbers at least once 然后采样 N 个数字，这样所有数字至少被采样一次。

set.seed(42)
vec = 1:8
N = ceiling(sum(length(vec)/(1:length(vec))))
foo = sample(c(vec, sample(vec, N - length(vec), TRUE)))
foo
# [1] 3 6 8 3 8 8 6 4 5 6 1 6 4 6 6 3 5 7 2 2 7 8

Answer 2

根据 d.b 的提示，这里有一个稍微更冗长的方法，它更节省内存（也更快一点，尽管我怀疑速度是你的问题）：

差异：

以块的形式预分配内存（此处大小为 100），减轻了扩展向量工作的问题；一次分配和扩展 100 个（甚至 1000 个）的成本要低得多
每次只比较最新的数字，而不是所有数字（前n-1个数字已经列成表格，不需要再做）

代码：

microbenchmark(
  r2evans = {
    emptyvec100 <- integer(100)
    counter <- 0
    out <- integer(0)
    unseen <- seq_len(n)
    set.seed(42)
    repeat {
      if (counter %% 100 == 0) out <- c(out, emptyvec100)
      counter <- counter+1
      num <- sample(n, size=1)
      unseen <- unseen[unseen != num]
      out[counter] <- num
      if (!length(unseen)) break
    }
    out <- out[1:counter]
  },
  d.b = {
    foo = integer(0)    
    set.seed(42)
    while(TRUE){
      foo = c(foo, sample(1:n, 1))
      if(all(1:n %in% foo)) break
    }
  }, times = 100, unit = 'us')
# Unit: microseconds
#     expr      min       lq     mean   median       uq      max neval
#  r2evans 1090.007 1184.639 1411.531 1228.947 1320.845 11344.24  1000
#      d.b 1242.440 1372.264 1835.974 1441.916 1597.267 14592.74  1000

（这既不是代码高尔夫也不是速度优化。我的主要目标是反对扩展向量工作，并提出更有效的比较技术。)

正如 d.b 进一步建议的那样，这对 1:8 工作正常，但可能运行遇到更大的问题。如果我们向上扩展 n：

(Edit: with d.b's code changes, the execution times are much closer, and not nearly as exponential looking. Apparently the removal of unique 对他的代码有很大的好处。）

R保持随机生成数字，直到指定范围内的所有数字都存在

R keep randomly generating numbers until all numbers within specified range are present

random

loops

r

repeat

conditional-statements