在没有替换的情况下对没有重复项的向量进行采样后发现重复值

Question

set.seed(999)
high1 <- c()
low1 <- c()
ss2 <- c()
x <- c(1,2,3,4,5,6,7,8)
for(k in 1:4){
  ss2 <- sample(x, 2, replace=FALSE)
  x <- x[-ss2] #after ss2 sampling, remove sample from the pool      
  high1 <- c(high1, max(ss2)) #append highest of ss2
  low1 <- c(low1, min(ss2)) #append lowest of ss2
  ss2 <- c() #init ss2 for next loop
}

high1 #\
low1  #/ both high1 and low1 should not have duplicated value since x<-1:8
ss2 #empty container after full sampling
x #should show empty vector after full kth loop

鉴于 x 是 c(1,2,3,4,5,6,7,8)，high1 和 low1 都应该显示非重复值，但我最终得到了

high1
#[1] 5 8 7 7

low1
#[1] 4 1 2 2

出了什么问题？

Answer 1

x[-ss2] 是错误的。您需要按索引而不是按值：x[-match(ss2, x)].

我得到修复后（仍在使用你的set.seed(999)）

high1
#[1] 5 7 8 6

low1
#[1] 4 1 2 3

矢量化解决方案的提示（不一定是最有效的）：

set.seed(999)
x <- 1:8
record <- matrix(sample(x), 2)
high1 <- pmax(record[1, ], record[2, ])
#[1] 5 7 8 3
low1 <- pmin(record[1, ], record[2, ])
#[1] 4 1 6 2

有趣的是，向量化方法给出的结果与使用循环的结果不同。

在没有替换的情况下对没有重复项的向量进行采样后发现重复值

duplicated values found after sampling a vector of no duplicates without replacement

loops

for-loop

r

vector

sampling