在 R 中使用嵌套循环进行模拟运行慢

Question

我正在使用 R 进行基于代理的历史模拟，代码运行速度很慢。它通过时间步长循环更新代理属性的数据框，以及另一个在每个时间步长（一代）之后更新总体状态的数据框。上面的循环是每个不同参数设置的几次运行。尽管它以 100 个代理开始，但在极端设置（高 S、低 A）之后，例如五代人口可以增长到一千以上。我读到更新矩阵比数据帧更快，所以我将摘要转换为矩阵。但我也听说矢量化是最好的，所以在我将代理更改为矩阵之前，我想知道是否有人可以建议一种使其更加矢量化的方法？这是代码：

NextGeneration <- function(agent, N, S, A) {
   # N is number of agents.
   # S is probability that an agent with traditional fertility will have 2 sons surviving to the age of inheritance.
   # A is probability that an heir experiencing division of estate changes his fertility preference from traditional to planned.
   # find number of surviving heirs for each agent
   excess <- runif(N)  # get random numbers 
   heir <- rep(1, N)  # everyone has at least 1 surviving heir 

   # if agent has traditional fertility 2 heirs may survive to inherit
   heir[agent$fertility == "Trad" & excess < S] <- 2  

   # next generation more numerous if spare heirs survive

   # new agents have vertical inheritance but also guided variation. 
   # first append to build a vector, then combine into new agent dataframe  
   nextgen.fertility <- NULL
   nextgen.lineage <- NULL

   for (i in 1:N) {

      if (heir[i]==2) {

         # two agents inherit from one parent.
         for (j in 1:2) {

            # A is probability of inheritance division event affecting fertility preference in new generation.
            if (A > runif(1)) {
               nextgen.fertility <- c(nextgen.fertility, "Plan") 
            } else {
               nextgen.fertility <- c(nextgen.fertility, agent$fertility[i])
            }
            nextgen.lineage <- c(nextgen.lineage, agent$lineage[i])
         }
      } else {
         nextgen.fertility <- c(nextgen.fertility, agent$fertility[i])
         nextgen.lineage <- c(nextgen.lineage, agent$lineage[i])
      }
   }
   # assemble new agent frame  
   nextgen.agent <- data.frame(nextgen.fertility, nextgen.lineage, stringsAsFactors = FALSE) 
   names(nextgen.agent) <- c("fertility", "lineage")
   nextgen.agent
}

所以代理人是这样开始的（Trad = traditional）：

ID      fertility   lineage,
1       Trad        1
2       Trad        2
3       Trad        3
4       Trad        4
5       Trad        5

经过几个时间步（几代）的随机更改后，结果如下：

ID      fertility   lineage
1       Plan       1
2       Plan       1
3       Trad       2
4       Plan       3
5       Trad       3
6       Trad       4
7       Plan       4
8       Plan       4
9       Plan       4
10      Plan       5
11      Trad       5

Answer 1

确实，用 0 和 1 编码 fertility 会更有效率，你甚至可以有一个整数矩阵。

无论如何，目前的代码可以简化很多 - 所以这是一个矢量化解决方案，仍然使用您的 data.frame:

NextGen <- function(agent, N, S, A) {
  excess <- runif(N)
  v1 <- which(agent$fertility == "Trad" & excess < S)
  nextgen.agent <- agent[c(1:N, v1), ]
  nextgen.agent[c(v1, seq.int(N+1, nrow(nextgen.agent))), "fertility"] <- ifelse(A > runif(length(v1)*2), "Plan", "Trad")
  nextgen.agent
}

用样本agent DF进行测试如下：

agentDF <- data.frame(fertility = "Trad", lineage = 1:50, stringsAsFactors = FALSE)

# use microbenchmark library to compare performance
microbenchmark::microbenchmark(
  base = {
    res1 <- NextGeneration(agentDF, 50, 0.8, 0.8) # note I fixed the two variable typos in your function
  }, 
  new = {
    res2 <- NextGen(agentDF, 50, 0.8, 0.8)
  }, 
  times = 100
)

## Unit: microseconds
## expr      min        lq     mean    median       uq       max neval
## base 1998.533 2163.8605 2446.561 2222.8200 2286.844 14413.173   100
##  new  282.032  304.1165  329.552  320.3255  348.488   467.217   100

在 R 中使用嵌套循环进行模拟运行慢

simulation in R with nested loops run slow

simulation

loops

r

matrix

vectorization

在 R 中使用嵌套循环进行模拟 运行 慢

simulation in R with nested loops run slow

simulation

loops

r

matrix

vectorization

在 R 中使用嵌套循环进行模拟运行慢