如何使循环更快？

Question

我的代码如下所示，我想知道是否有更好的方法可以让它更快：

pos=NULL
row=data.frame(matrix(nrow=216,ncol=4))
colnames(row)=c("sub","subi","group","trial")
for (i in 1:100000){
  row$sub="Positive"
  row$subi=NA
  row$group=NA
  row$subi[1:144]=c(1:144)
  row$group[1:144]=1
  row$subi[145:216]=c(1:72)
  row$group[145:216]=2
  row$trial=i
  pos=rbind(pos,row)
}

Answer 1

不需要循环。您可以自己构建 data.frame 或 tibble（我的示例）。

鉴于您稍后要调整行长：

library(dplyr)

n_rows <- 10000

tibble(
  trail = 1:n_rows,
  sub = "positive",
  subi = c(seq(1:144), seq(1:72), rep(NA, n_rows-216)),     
  group = c(rep(1, 144), rep(2, 72), rep(NA, n_rows-216))
  )

输出为：

# A tibble: 10,000 × 4
   trail sub       subi group
   <int> <chr>    <int> <dbl>
 1     1 positive     1     1
 2     2 positive     2     1
 3     3 positive     3     1
 4     4 positive     4     1
 5     5 positive     5     1
 6     6 positive     6     1
 7     7 positive     7     1
 8     8 positive     8     1
 9     9 positive     9     1
10    10 positive    10     1
# … with 9,990 more rows

Answer 2

看起来您正在尝试将此数据帧复制 100,000 次，每次帧迭代都有不同的 trial 编号。

data.frame(sub = rep("Positive", 216), 
           subi = c(1:144, 1:72), 
           group = rep(c(1, 2), c(144, 72)))

replicate 函数非常适合运行多次静态代码。因此，一种选择是创建 100,000 个副本，然后更新试用编号。

FrameList <- 
  replicate(n = 100, 
            {
              data.frame(sub = rep("Positive", 216), 
                         subi = c(1:144, 1:72), 
                         group = rep(c(1, 2), c(144, 72)), 
                         trial = rep(NA_real_, 216))
            }, 
            simplify = FALSE)

要更新试用号，您可以使用 for 循环

for (i in seq_along(FrameList)){
  FrameList$trial <- i
}

或者你可以尝试一些东西fancy-pants，但需要更多的代码

FrameList <- mapply(function(FL, i){
                      FL$trial <- i 
                      FL
                    },
                    FrameList, 
                    seq_along(FrameList), 
                    SIMPLIFY = FALSE)

无论你走哪条路，你都可以用

把它们叠在一起

Frame <- do.call("rbind", FrameList)

这当然不是执行此操作的最优雅方法，因此请留意其他人为您提供的其他巧妙技巧。但我想这将是要遵循的基本过程。

Answer 3

每次循环中唯一不同的是trial。 rep 是你的朋友。对于其他列，R会自动回收匹配最长的列（这里是trial，2160万行）。

pos <- data.frame(
  sub = "Positive",
  subi = c(1:144, 1:72),
  group = rep.int(1:2, c(144, 72)),
  trial = rep(1:1e5, each = 216)
)

如何使循环更快？

How to make the loop faster?

loops

r