以列为条件按组生成随机分布
Generate a random distribution by group conditional on a column
我想根据一个列生成两个不同的分布。例如,如果 z1
高于 25,我在这里生成正态分布 rnorm()
,否则生成泊松分布 rpois()
。此外,我想从规定的分布中获得按组(列 id
)的变化。
现在我有以下代码:
df <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L,
4L, 4L), z1 = c(21L, 21L, 21L, 28L, 28L, 28L, 30L, 30L, 30L,
20L, 20L, 20L)), row.names = c(NA, -12L), class = "data.frame")
df$sample <- with(df, ifelse(z1 > 25,
rnorm(n = 1,mean = 0,sd = 1), ##Normal(0,1)
rpois(n = 1,lambda = 5))) ## Poisson(5)
# id z1 sample
# 1 1 21 6.0000000
# 2 1 21 6.0000000
# 3 1 21 6.0000000
# 4 2 28 -0.8036847
# 5 2 28 -0.8036847
# 6 2 28 -0.8036847
# 7 3 30 -0.8036847
# 8 3 30 -0.8036847
# 9 3 30 -0.8036847
# 10 4 20 6.0000000
# 11 4 20 6.0000000
# 12 4 20 6.0000000
不幸的是,正如您在上面看到的那样我没有在 id 组(第 id
列)中得到变化。
下面是我在 desired_sample
.
列中想要的输出
# id z1 sample desired_sample
# 1 1 21 6.0000000 5.0000000
# 2 1 21 6.0000000 5.0000000
# 3 1 21 6.0000000 5.0000000
# 4 2 28 -0.8036847 0.7356226
# 5 2 28 -0.8036847 0.7356226
# 6 2 28 -0.8036847 0.7356226
# 7 3 30 -0.8036847 -1.359669
# 8 3 30 -0.8036847 -1.359669
# 9 3 30 -0.8036847 -1.359669
# 10 4 20 6.0000000 4.0000000
# 11 4 20 6.0000000 4.0000000
# 12 4 20 6.0000000 4.0000000
[跟进]
下面的代码可以做到,但是...
con_dist2 <- function(x){
ifelse( x>=25,
return(rnorm(1,mean = 0 , sd = 1 )),
return(rpois(1,lambda = 5 )))
}
df$desired_sample2<- with(df ,ave(x = z1, id, FUN = con_dist2), )
...是否有任何方法可以将阈值 (25
) 作为函数 con_dist2
输入包含在内,以使其更加灵活和可重用?
尝试对您的代码进行此更改:
#Function
con_dist2 <- function(x,n){
ifelse( x>=n,
return(rnorm(1,mean = 0 , sd = 1 )),
return(rpois(1,lambda = 5 )))
}
#Apply
df$desired_sample2<- with(df ,ave(x = z1, id, FUN = function(x) con_dist2(x,n=25)) )
更多参数试试这个:
#Function 2
con_dist2 <- function(x,n,mymean){
ifelse( x>=n,
return(rnorm(1,mean = mymean , sd = 1 )),
return(rpois(1,lambda = 5 )))
}
#Apply 2
df$desired_sample2<- with(df ,ave(x = z1, id, FUN = function(x) con_dist2(x,n=25,mymean = 0)) )
我想根据一个列生成两个不同的分布。例如,如果 z1
高于 25,我在这里生成正态分布 rnorm()
,否则生成泊松分布 rpois()
。此外,我想从规定的分布中获得按组(列 id
)的变化。
现在我有以下代码:
df <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L,
4L, 4L), z1 = c(21L, 21L, 21L, 28L, 28L, 28L, 30L, 30L, 30L,
20L, 20L, 20L)), row.names = c(NA, -12L), class = "data.frame")
df$sample <- with(df, ifelse(z1 > 25,
rnorm(n = 1,mean = 0,sd = 1), ##Normal(0,1)
rpois(n = 1,lambda = 5))) ## Poisson(5)
# id z1 sample
# 1 1 21 6.0000000
# 2 1 21 6.0000000
# 3 1 21 6.0000000
# 4 2 28 -0.8036847
# 5 2 28 -0.8036847
# 6 2 28 -0.8036847
# 7 3 30 -0.8036847
# 8 3 30 -0.8036847
# 9 3 30 -0.8036847
# 10 4 20 6.0000000
# 11 4 20 6.0000000
# 12 4 20 6.0000000
不幸的是,正如您在上面看到的那样我没有在 id 组(第 id
列)中得到变化。
下面是我在 desired_sample
.
# id z1 sample desired_sample
# 1 1 21 6.0000000 5.0000000
# 2 1 21 6.0000000 5.0000000
# 3 1 21 6.0000000 5.0000000
# 4 2 28 -0.8036847 0.7356226
# 5 2 28 -0.8036847 0.7356226
# 6 2 28 -0.8036847 0.7356226
# 7 3 30 -0.8036847 -1.359669
# 8 3 30 -0.8036847 -1.359669
# 9 3 30 -0.8036847 -1.359669
# 10 4 20 6.0000000 4.0000000
# 11 4 20 6.0000000 4.0000000
# 12 4 20 6.0000000 4.0000000
[跟进]
下面的代码可以做到,但是...
con_dist2 <- function(x){
ifelse( x>=25,
return(rnorm(1,mean = 0 , sd = 1 )),
return(rpois(1,lambda = 5 )))
}
df$desired_sample2<- with(df ,ave(x = z1, id, FUN = con_dist2), )
...是否有任何方法可以将阈值 (25
) 作为函数 con_dist2
输入包含在内,以使其更加灵活和可重用?
尝试对您的代码进行此更改:
#Function
con_dist2 <- function(x,n){
ifelse( x>=n,
return(rnorm(1,mean = 0 , sd = 1 )),
return(rpois(1,lambda = 5 )))
}
#Apply
df$desired_sample2<- with(df ,ave(x = z1, id, FUN = function(x) con_dist2(x,n=25)) )
更多参数试试这个:
#Function 2
con_dist2 <- function(x,n,mymean){
ifelse( x>=n,
return(rnorm(1,mean = mymean , sd = 1 )),
return(rpois(1,lambda = 5 )))
}
#Apply 2
df$desired_sample2<- with(df ,ave(x = z1, id, FUN = function(x) con_dist2(x,n=25,mymean = 0)) )