R 中没有替换或重复的样本

Question

我有一个很长的列表，其中包含很多重复项，例如 100,000 个值，其中 20% 是重复项。我想从此列表中随机抽样，将所有值分组，比如 400 个。但是，我不希望任何后续组在其中包含重复值 - 即我希望每个组的所有 250 名成员都是唯一的。

我尝试过使用来自 vegan、picante、EcoSimR 的各种排列方法，但它们并没有完全按照我的要求做，或者似乎难以处理大量数据。

我想知道是否有一些我无法弄清楚的使用示例函数的方法？任何帮助或替代建议将不胜感激...

Answer 1

正如 nico 所述，您可能只需要使用 unique 函数。下面是一个非常简单的抽样程序，可确保各组之间不会出现重复（这并不完全明智，因为您可以只创建一个大样本...）

# Getting some random values to use here
set.seed(seed = 14412)
thevalues <- sample(x = 1:100,size = 1000,replace = TRUE)

# Obtaining the unique vector of those values
thevalues.unique <- unique(thevalues)

# Create a sample without replacement (i.e. take the ball out and don't put it back in)
sample1 <- sample(x = thevalues.unique,size = 10,replace = FALSE)

# Remove the sampled items from the vector of values
thevalues.unique <- thevalues.unique[!(thevalues.unique %in% sample1)]

# Another sample, and another removal
sample2 <- sample(x = thevalues.unique,size = 10,replace = FALSE)
thevalues.unique <- thevalues.unique[!(thevalues.unique %in% sample2)]

要执行 eipi10 提到的操作并获得加权分布，您只需要先获得分布的频率。一种方法：

set.seed(seed = 14412)
thevalues <- sample(x = 1:100,size = 1000,replace = TRUE,prob = c(rep(0.01,100)))

thevalues.unique <- unique(thevalues)
thevalues.unique <- thevalues.unique[order(thevalues.unique)]
thevalues.probs <- table(thevalues)/length(thevalues)
sample1 <- sample(x = thevalues.unique,
                  size = 10,
                  replace = FALSE,
                  prob = thevalues.probs)

R 中没有替换或重复的样本

Sample without replacement, or duplicates, in R

r

permutation

random-sample