来自 R 中不同列的样本
Sample from different columns in R
我有一个概率向量,比方说
prob=c(0.1,0.8,0.1)
和一个数据框:df=cbind(c("A","B","A"),c(1,2,3),c("q","v","z"))
我想从 df 中采样 n
对象并进行替换,第一列的概率为 0.1,第二列的概率为 0.8,第三列的概率为 0.1
我们将取消列出 data.frame,并即时修改我们的 prob
向量,使其具有适当的长度。
df <- data.frame(c("A","B","A"), c(1,2,3), c("q","v","z"), stringsAsFactors = F)
n <- 5
set.seed(1)
unname(sample(unlist(df), n, replace = TRUE, prob= rep(prob, each = nrow(df))))
# [1] "3" "1" "A" "z" "2"
如果你真的从矩阵开始而不是 data.frame
那会更短一点:
df=cbind(c("A","B","A"),c(1,2,3),c("q","v","z"))
set.seed(1)
sample(df, n, replace = TRUE, prob= rep(prob, each = nrow(df)))
# [1] "3" "1" "A" "z" "2"
来自列表(回复评论)
l =list(c("A","B"),c(1,2,3),c("q","v","z","w"))
set.seed(1)
sample(unlist(l), n, replace = TRUE, prob= rep(prob/lengths(l), lengths(l)))
# [1] "3" "2" "1" "v" "3" "B" "q"
这是基于假设一列内的样本概率是均匀的:
我们首先使用向量 prob
;
中的概率对 n
列位置进行采样
df=cbind(c("A","B","A"),c(1,2,3),c("q","v","z"))
prob=c(0.1,0.8,0.1)
n = 10
set.seed(1)
colselect <- sample(1:ncol(df), size = n, replace = TRUE, prob = prob)
[1] 2 2 2 1 2 3 1 2 2 2
然后我们遍历列位置并从各自的列中每个采样一个元素:
sapply(colselect, function(x) sample(df[,x], 1))
[1] "1" "1" "3" "B" "3" "v" "A" "3" "2" "3"
我有一个概率向量,比方说
prob=c(0.1,0.8,0.1)
和一个数据框:df=cbind(c("A","B","A"),c(1,2,3),c("q","v","z"))
我想从 df 中采样 n
对象并进行替换,第一列的概率为 0.1,第二列的概率为 0.8,第三列的概率为 0.1
我们将取消列出 data.frame,并即时修改我们的 prob
向量,使其具有适当的长度。
df <- data.frame(c("A","B","A"), c(1,2,3), c("q","v","z"), stringsAsFactors = F)
n <- 5
set.seed(1)
unname(sample(unlist(df), n, replace = TRUE, prob= rep(prob, each = nrow(df))))
# [1] "3" "1" "A" "z" "2"
如果你真的从矩阵开始而不是 data.frame
那会更短一点:
df=cbind(c("A","B","A"),c(1,2,3),c("q","v","z"))
set.seed(1)
sample(df, n, replace = TRUE, prob= rep(prob, each = nrow(df)))
# [1] "3" "1" "A" "z" "2"
来自列表(回复评论)
l =list(c("A","B"),c(1,2,3),c("q","v","z","w"))
set.seed(1)
sample(unlist(l), n, replace = TRUE, prob= rep(prob/lengths(l), lengths(l)))
# [1] "3" "2" "1" "v" "3" "B" "q"
这是基于假设一列内的样本概率是均匀的:
我们首先使用向量 prob
;
n
列位置进行采样
df=cbind(c("A","B","A"),c(1,2,3),c("q","v","z"))
prob=c(0.1,0.8,0.1)
n = 10
set.seed(1)
colselect <- sample(1:ncol(df), size = n, replace = TRUE, prob = prob)
[1] 2 2 2 1 2 3 1 2 2 2
然后我们遍历列位置并从各自的列中每个采样一个元素:
sapply(colselect, function(x) sample(df[,x], 1))
[1] "1" "1" "3" "B" "3" "v" "A" "3" "2" "3"