在 R 中将模拟结果存储为 data.table

Storing simulation results as data.table in R

我必须做很多模拟,这需要很多时间。我认为可以通过 data.table 减少处理时间。如何将 mdply(data.frame(prob=seq(from = 0.1, to = 0.9, by = 0.1)), rbinom, n = 5, size = 2) 的结果存储到 data.table 而不先将其输出保存到 data.frame.

library(plyr)
df1 <- mdply(data.frame(prob=seq(from = 0.1, to = 0.9, by = 0.1)), rbinom, n = 5, size = 2)
library(data.table)
dt1 <- data.table(df1)

已编辑

我知道我可以使用 setDT(df1) 来避免创建到 dt1。但是,主要问题是关于 mdply 的,它会创建一个 data.frame,这会消耗大量时间。

plyrdata.table 的用途非常相似,因此您通常根本不需要在两者之间来回切换。在这种情况下,您可以使用 data.table 执行所有操作:

dt = data.table(prob = seq(0.1, 0.9, by = 0.1))
dt = dt[, as.list(rbinom(prob, n = 5, size = 2)), by = prob]
dt
   prob V1 V2 V3 V4 V5
1:  0.1  0  0  0  0  0
2:  0.2  0  0  0  0  1
3:  0.3  1  2  1  0  1
4:  0.4  1  1  2  1  0
5:  0.5  2  2  1  1  1
6:  0.6  1  1  0  0  1
7:  0.7  2  1  2  1  0
8:  0.8  2  1  2  0  1
9:  0.9  2  2  2  2  2

我想补充一点,我的直觉是最快的方法是先创建矩阵,然后分配列。

> mat = mapply(rbinom, prob = dt$prob, n = 5, size = 2)
> cbind(dt, t(mat))
   prob V1 V2 V3 V4 V5
1:  0.1  0  0  0  0  0
2:  0.2  1  0  0  1  1
3:  0.3  1  1  1  0  0
4:  0.4  1  0  2  1  1
5:  0.5  1  1  1  0  2
6:  0.6  2  0  2  1  1
7:  0.7  1  1  1  2  1
8:  0.8  1  2  1  0  2
9:  0.9  1  1  2  1  1

对 8000 行的非常快速的测试 table 表明这更快:

> dt = data.table(prob = (seq(0.1, 0.9, by = 0.00001)))
> system.time(for(i in 1:10) dt[, as.list(rbinom(prob, n = 5, size = 2)), by = prob])
   user  system elapsed 
   6.14    0.00    6.16 
> system.time(for(i in 1:10) {mat = mapply(rbinom, prob = dt$prob, n = 5, size = 2) ; cbind(dt, t(mat))})
   user  system elapsed 
   2.61    0.00    2.62 

而且两者都比原来有了实质性的改进:

> system.time(for(i in 1:10) {df1 = mdply(df, rbinom, n = 5, size = 2) ; dt1 = data.table(df1)})
   user  system elapsed 
 152.23   46.60  200.07