使用R中的不同概率随机为行分配不同的值
Randomly assign different values to rows using different probability in R
有这样一个数据框:
ID var
1 NA
2 NA
3 NA
4 NA
...
我需要随机分配 var
20% 行的值作为 A,30% 行作为 B,50% 行作为 C。
有什么有效的方法可以解决这个问题吗?
假设您有一个名为 df 的数据框:
那么你可以写:
randvar = sample(c('A','B','C'),size = nrow(df),prob = c(0.2,0.3,0.5),replace = TRUE)
df$var = randvar
假设您希望 "A"s 正确地占 20%,那么 "B" 占 30%,"C" 占 50%
那么它不是一行代码,假设你的 c(0.2,0.3,0.5)*df_size 都是整数我的答案是:
n = nrow(df)
df$var = "C" #initialize all value to be "C"
index = 1:n
indexa = sample(index,0.2*n) #pick 20% index for "A"
indexb = sample(index[-indexa],0.3*n) #pick 30% index for "B" need to rule out the "A"s you already picked
df$var[indexa] = "A" #assign "A" to df$var at indexa
df$var[indexb] = "B" #assign "B" to df$var at indexb
#the rest 50% is "C"
有这样一个数据框:
ID var
1 NA
2 NA
3 NA
4 NA
...
我需要随机分配 var
20% 行的值作为 A,30% 行作为 B,50% 行作为 C。
有什么有效的方法可以解决这个问题吗?
假设您有一个名为 df 的数据框: 那么你可以写:
randvar = sample(c('A','B','C'),size = nrow(df),prob = c(0.2,0.3,0.5),replace = TRUE)
df$var = randvar
假设您希望 "A"s 正确地占 20%,那么 "B" 占 30%,"C" 占 50% 那么它不是一行代码,假设你的 c(0.2,0.3,0.5)*df_size 都是整数我的答案是:
n = nrow(df)
df$var = "C" #initialize all value to be "C"
index = 1:n
indexa = sample(index,0.2*n) #pick 20% index for "A"
indexb = sample(index[-indexa],0.3*n) #pick 30% index for "B" need to rule out the "A"s you already picked
df$var[indexa] = "A" #assign "A" to df$var at indexa
df$var[indexb] = "B" #assign "B" to df$var at indexb
#the rest 50% is "C"