使用 r 中的集合概率从数据集中选择

Question

我正在运行对我正在进行的 select 离子实验进行一些模拟。作为其中的一部分，我想从我已经使用概率模拟 selection.

的数据集中 select

我首先使用起始频率创建初始群体，其中获得 1 的概率为 0.25，获得 2 的概率为 0.5，获得 3 的概率为 0.25。 1,2 和 3 代表 3 种不同的基因型。

N <- 400
my_prob = c(0.25,0.5,0.25)
N1=sample(c(1:3), N, replace= TRUE, prob=my_prob)
P1 <-data.frame(N1)

我现在想在我的群体中模拟 selection，其中一个纯合子被 selected 反对，并且有部分 selection 反对杂合子所以 ((1-s )^2, (1-s), 1) 其中 s=0.2 在这个例子中。最初我使用 sample_frac() 函数分别对每个组进行采样，然后重新组合数据集。

s <- 0.2
S1homo<- filter(P1, N1==1) %>%
  sample_frac((1-s)^2, replace= FALSE)
S1hetero <-filter(P1, N1==2) %>%
  sample_frac((1-s), replace= FALSE)
S1others <-filter(P1, N1==3)
S1 <- rbind(S1homo, S1hetero, S1others)

问题是它 return 的数字没有任何可变性，这是不现实的，例如，当我设置时，S1homo 总是 return 恰好是 1 值的 64% s=0.2 而在我的初始群体中，每个值的数字存在一些差异。

所以我想知道是否有一种方法可以使用不同基因型的 ((1-s)^2,(1-s), 1) 的集合概率从我的 P1 人群中 select因此，对于每个 selected 反对的群体，我并不总是得到完全相同的 returned 数字。我尝试使用我之前使用的 sample() 函数来执行此操作，但我无法让它工作。

# sel is done to give the total number of values there will be in the new population when times by N
sel <-((1-s)^2 + 2*(1-s)+1)/4 
S1 <-sample(P1, N*sel, replace=FALSE, prob=c((1-s)^2,(1-s),1))

Error in sample.int(length(x), size, replace, prob) : cannot take a sample larger than the population when 'replace = FALSE'

Answer 1

我不是 100% 确定您要做什么，但是如果您希望 (1-s)^2 是随机选择的元素包含在样本中的概率，而不是选择的确切百分比，您可以使用 sample_n 而不是 sample_frac，n 是随机选择的以反映该比率：

S1homo<- filter(P1, N1==1) %>%
    sample_n(rbinom(1,sum(N1==1),(1-s)^2))

像这样使用 rbinom 可能有点间接，但我没有看到另一种方法可以轻松地使用 %>%。

使用 r 中的集合概率从数据集中选择

selecting from a dataset using set probabilities in r

r

sample

selection

sampling