data.frame 的每一列的随机样本
Random samples from each column of a data.frame
我想从 data.frame
的每一行独立于其他行抽取随机样本。这是一个例子。此代码 select 每行的列相同,但我需要每行的独立 selection 列。
library(plyr)
set.seed(12345)
df1 <- mdply(data.frame(mean=c(10, 15)), rnorm, n = 5, sd = 1)
df1
mean V1 V2 V3 V4 V5
1 10 10.58553 10.70947 9.890697 9.546503 10.60589
2 15 13.18204 15.63010 14.723816 14.715840 14.08068
> df1[ , -1]
V1 V2 V3 V4 V5
1 10.58553 10.70947 9.890697 9.546503 10.60589
2 13.18204 15.63010 14.723816 14.715840 14.08068
> sample(df1[, -1], replace = TRUE)
V3 V2 V5 V4 V4.1
1 9.890697 10.70947 10.60589 9.546503 9.546503
2 14.723816 15.63010 14.08068 14.715840 14.715840
> t(apply(df1[, -1], 1, sample))
[,1] [,2] [,3] [,4] [,5]
[1,] 10.70947 9.890697 10.60589 10.58553 9.546503
[2,] 14.71584 13.182044 14.08068 15.63010 14.723816
已编辑
df1[ , -1]
V1 V2 V3 V4 V5
1 10.58553 10.70947 9.890697 9.546503 10.60589
2 13.18204 15.63010 14.723816 14.715840 14.08068
sample(df1[, -1], replace = TRUE)
V3 V2 V5 V4 V4.1
1 9.890697 10.70947 10.60589 9.546503 9.546503
2 14.723816 15.63010 14.08068 14.715840 14.715840
sample(df1[, -1], replace = TRUE)
selects 列 V3
、V2
、V5
、V4
和 V4
行。但我要求它可以 select 列 V3
、V2
、V5
、V4
和 V4
for first row
and/or second row
五列的任意组合。
您可以将 apply
与 replace=TRUE
一起用于 sample
t(apply(df1[,-1], 1, sample, replace=TRUE))
您可以一次对所有列索引进行采样,然后使用矩阵子集来避免必须使用 apply
:
## Determine how many indices are required (nrow x (ncol - 1))
nsamp <- prod(dim(df1[, -1]))
## Sample from the number of desired columns, here 5 = ncol(df1[, -1])
mySamp <- sample.int(5, nsamp, replace = TRUE)
## Create a matrix of row and column indices
## Have to add 1 to mySamp to ignore first column of df1
myIdx <- cbind(rep(seq_len(nrow(df1)), ncol(df1) - 1), mySamp + 1)
## Return the corresponding values
matrix(df1[myIdx], nrow = nrow(df1))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 9.890697 10.60589 9.546503 9.546503 10.70947
# [2,] 15.630099 14.71584 15.630099 14.723816 14.72382
我想从 data.frame
的每一行独立于其他行抽取随机样本。这是一个例子。此代码 select 每行的列相同,但我需要每行的独立 selection 列。
library(plyr)
set.seed(12345)
df1 <- mdply(data.frame(mean=c(10, 15)), rnorm, n = 5, sd = 1)
df1
mean V1 V2 V3 V4 V5
1 10 10.58553 10.70947 9.890697 9.546503 10.60589
2 15 13.18204 15.63010 14.723816 14.715840 14.08068
> df1[ , -1]
V1 V2 V3 V4 V5
1 10.58553 10.70947 9.890697 9.546503 10.60589
2 13.18204 15.63010 14.723816 14.715840 14.08068
> sample(df1[, -1], replace = TRUE)
V3 V2 V5 V4 V4.1
1 9.890697 10.70947 10.60589 9.546503 9.546503
2 14.723816 15.63010 14.08068 14.715840 14.715840
> t(apply(df1[, -1], 1, sample))
[,1] [,2] [,3] [,4] [,5]
[1,] 10.70947 9.890697 10.60589 10.58553 9.546503
[2,] 14.71584 13.182044 14.08068 15.63010 14.723816
已编辑
df1[ , -1]
V1 V2 V3 V4 V5
1 10.58553 10.70947 9.890697 9.546503 10.60589
2 13.18204 15.63010 14.723816 14.715840 14.08068
sample(df1[, -1], replace = TRUE)
V3 V2 V5 V4 V4.1
1 9.890697 10.70947 10.60589 9.546503 9.546503
2 14.723816 15.63010 14.08068 14.715840 14.715840
sample(df1[, -1], replace = TRUE)
selects 列 V3
、V2
、V5
、V4
和 V4
行。但我要求它可以 select 列 V3
、V2
、V5
、V4
和 V4
for first row
and/or second row
五列的任意组合。
您可以将 apply
与 replace=TRUE
一起用于 sample
t(apply(df1[,-1], 1, sample, replace=TRUE))
您可以一次对所有列索引进行采样,然后使用矩阵子集来避免必须使用 apply
:
## Determine how many indices are required (nrow x (ncol - 1))
nsamp <- prod(dim(df1[, -1]))
## Sample from the number of desired columns, here 5 = ncol(df1[, -1])
mySamp <- sample.int(5, nsamp, replace = TRUE)
## Create a matrix of row and column indices
## Have to add 1 to mySamp to ignore first column of df1
myIdx <- cbind(rep(seq_len(nrow(df1)), ncol(df1) - 1), mySamp + 1)
## Return the corresponding values
matrix(df1[myIdx], nrow = nrow(df1))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 9.890697 10.60589 9.546503 9.546503 10.70947
# [2,] 15.630099 14.71584 15.630099 14.723816 14.72382