以列为条件按组生成随机分布

Generate a random distribution by group conditional on a column

我想根据一个列生成两个不同的分布。例如,如果 z1 高于 25,我在这里生成正态分布 rnorm(),否则生成泊松分布 rpois()。此外,我想从规定的分布中获得按组(列 id)的变化。

现在我有以下代码:

df <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 
                      4L, 4L), z1 = c(21L, 21L, 21L, 28L, 28L, 28L, 30L, 30L, 30L, 
                                      20L, 20L, 20L)), row.names = c(NA, -12L), class = "data.frame")  
  
df$sample  <- with(df, ifelse(z1 > 25, 
                         rnorm(n = 1,mean = 0,sd = 1), ##Normal(0,1)
                         rpois(n = 1,lambda = 5)))     ## Poisson(5) 

  # id z1     sample
  # 1   1 21  6.0000000
  # 2   1 21  6.0000000
  # 3   1 21  6.0000000
  # 4   2 28 -0.8036847
  # 5   2 28 -0.8036847
  # 6   2 28 -0.8036847
  # 7   3 30 -0.8036847
  # 8   3 30 -0.8036847
  # 9   3 30 -0.8036847
  # 10  4 20  6.0000000
  # 11  4 20  6.0000000
  # 12  4 20  6.0000000

不幸的是,正如您在上面看到的那样我没有在 id 组(第 id 列)中得到变化。 下面是我在 desired_sample.

列中想要的输出
  
  #     id z1     sample     desired_sample
  # 1   1 21  6.0000000  5.0000000
  # 2   1 21  6.0000000  5.0000000
  # 3   1 21  6.0000000  5.0000000
  # 4   2 28 -0.8036847  0.7356226
  # 5   2 28 -0.8036847  0.7356226
  # 6   2 28 -0.8036847  0.7356226
  # 7   3 30 -0.8036847 -1.359669
  # 8   3 30 -0.8036847 -1.359669
  # 9   3 30 -0.8036847 -1.359669
  # 10  4 20  6.0000000  4.0000000
  # 11  4 20  6.0000000  4.0000000
  # 12  4 20  6.0000000  4.0000000

[跟进]

下面的代码可以做到,但是...

con_dist2 <- function(x){
  ifelse( x>=25,
          return(rnorm(1,mean = 0 , sd = 1 )),
          return(rpois(1,lambda = 5 )))
}

df$desired_sample2<- with(df ,ave(x = z1, id, FUN = con_dist2), )

...是否有任何方法可以将阈值 (25) 作为函数 con_dist2 输入包含在内,以使其更加灵活和可重用?

尝试对您的代码进行此更改:

#Function
con_dist2 <- function(x,n){
  ifelse( x>=n,
          return(rnorm(1,mean = 0 , sd = 1 )),
          return(rpois(1,lambda = 5 )))
}
#Apply
df$desired_sample2<- with(df ,ave(x = z1, id, FUN = function(x) con_dist2(x,n=25)) )

更多参数试试这个:

#Function 2
con_dist2 <- function(x,n,mymean){
  ifelse( x>=n,
          return(rnorm(1,mean = mymean , sd = 1 )),
          return(rpois(1,lambda = 5 )))
}
#Apply 2
df$desired_sample2<- with(df ,ave(x = z1, id, FUN = function(x) con_dist2(x,n=25,mymean = 0)) )