在 R 中使用嵌套循环进行模拟 运行 慢
simulation in R with nested loops run slow
我正在使用 R 进行基于代理的历史模拟,代码运行速度很慢。它通过时间步长循环更新代理属性的数据框,以及另一个在每个时间步长(一代)之后更新总体状态的数据框。上面的循环是每个不同参数设置的几次运行。尽管它以 100 个代理开始,但在极端设置(高 S、低 A)之后,例如五代人口可以增长到一千以上。我读到更新矩阵比数据帧更快,所以我将摘要转换为矩阵。但我也听说矢量化是最好的,所以在我将代理更改为矩阵之前,我想知道是否有人可以建议一种使其更加矢量化的方法?这是代码:
NextGeneration <- function(agent, N, S, A) {
# N is number of agents.
# S is probability that an agent with traditional fertility will have 2 sons surviving to the age of inheritance.
# A is probability that an heir experiencing division of estate changes his fertility preference from traditional to planned.
# find number of surviving heirs for each agent
excess <- runif(N) # get random numbers
heir <- rep(1, N) # everyone has at least 1 surviving heir
# if agent has traditional fertility 2 heirs may survive to inherit
heir[agent$fertility == "Trad" & excess < S] <- 2
# next generation more numerous if spare heirs survive
# new agents have vertical inheritance but also guided variation.
# first append to build a vector, then combine into new agent dataframe
nextgen.fertility <- NULL
nextgen.lineage <- NULL
for (i in 1:N) {
if (heir[i]==2) {
# two agents inherit from one parent.
for (j in 1:2) {
# A is probability of inheritance division event affecting fertility preference in new generation.
if (A > runif(1)) {
nextgen.fertility <- c(nextgen.fertility, "Plan")
} else {
nextgen.fertility <- c(nextgen.fertility, agent$fertility[i])
}
nextgen.lineage <- c(nextgen.lineage, agent$lineage[i])
}
} else {
nextgen.fertility <- c(nextgen.fertility, agent$fertility[i])
nextgen.lineage <- c(nextgen.lineage, agent$lineage[i])
}
}
# assemble new agent frame
nextgen.agent <- data.frame(nextgen.fertility, nextgen.lineage, stringsAsFactors = FALSE)
names(nextgen.agent) <- c("fertility", "lineage")
nextgen.agent
}
所以代理人是这样开始的(Trad = traditional):
ID fertility lineage,
1 Trad 1
2 Trad 2
3 Trad 3
4 Trad 4
5 Trad 5
经过几个时间步(几代)的随机更改后,结果如下:
ID fertility lineage
1 Plan 1
2 Plan 1
3 Trad 2
4 Plan 3
5 Trad 3
6 Trad 4
7 Plan 4
8 Plan 4
9 Plan 4
10 Plan 5
11 Trad 5
确实,用 0 和 1 编码 fertility
会更有效率,你甚至可以有一个整数矩阵。
无论如何,目前的代码可以简化很多 - 所以这是一个矢量化解决方案,仍然使用您的 data.frame
:
NextGen <- function(agent, N, S, A) {
excess <- runif(N)
v1 <- which(agent$fertility == "Trad" & excess < S)
nextgen.agent <- agent[c(1:N, v1), ]
nextgen.agent[c(v1, seq.int(N+1, nrow(nextgen.agent))), "fertility"] <- ifelse(A > runif(length(v1)*2), "Plan", "Trad")
nextgen.agent
}
用样本agent
DF进行测试如下:
agentDF <- data.frame(fertility = "Trad", lineage = 1:50, stringsAsFactors = FALSE)
# use microbenchmark library to compare performance
microbenchmark::microbenchmark(
base = {
res1 <- NextGeneration(agentDF, 50, 0.8, 0.8) # note I fixed the two variable typos in your function
},
new = {
res2 <- NextGen(agentDF, 50, 0.8, 0.8)
},
times = 100
)
## Unit: microseconds
## expr min lq mean median uq max neval
## base 1998.533 2163.8605 2446.561 2222.8200 2286.844 14413.173 100
## new 282.032 304.1165 329.552 320.3255 348.488 467.217 100
我正在使用 R 进行基于代理的历史模拟,代码运行速度很慢。它通过时间步长循环更新代理属性的数据框,以及另一个在每个时间步长(一代)之后更新总体状态的数据框。上面的循环是每个不同参数设置的几次运行。尽管它以 100 个代理开始,但在极端设置(高 S、低 A)之后,例如五代人口可以增长到一千以上。我读到更新矩阵比数据帧更快,所以我将摘要转换为矩阵。但我也听说矢量化是最好的,所以在我将代理更改为矩阵之前,我想知道是否有人可以建议一种使其更加矢量化的方法?这是代码:
NextGeneration <- function(agent, N, S, A) {
# N is number of agents.
# S is probability that an agent with traditional fertility will have 2 sons surviving to the age of inheritance.
# A is probability that an heir experiencing division of estate changes his fertility preference from traditional to planned.
# find number of surviving heirs for each agent
excess <- runif(N) # get random numbers
heir <- rep(1, N) # everyone has at least 1 surviving heir
# if agent has traditional fertility 2 heirs may survive to inherit
heir[agent$fertility == "Trad" & excess < S] <- 2
# next generation more numerous if spare heirs survive
# new agents have vertical inheritance but also guided variation.
# first append to build a vector, then combine into new agent dataframe
nextgen.fertility <- NULL
nextgen.lineage <- NULL
for (i in 1:N) {
if (heir[i]==2) {
# two agents inherit from one parent.
for (j in 1:2) {
# A is probability of inheritance division event affecting fertility preference in new generation.
if (A > runif(1)) {
nextgen.fertility <- c(nextgen.fertility, "Plan")
} else {
nextgen.fertility <- c(nextgen.fertility, agent$fertility[i])
}
nextgen.lineage <- c(nextgen.lineage, agent$lineage[i])
}
} else {
nextgen.fertility <- c(nextgen.fertility, agent$fertility[i])
nextgen.lineage <- c(nextgen.lineage, agent$lineage[i])
}
}
# assemble new agent frame
nextgen.agent <- data.frame(nextgen.fertility, nextgen.lineage, stringsAsFactors = FALSE)
names(nextgen.agent) <- c("fertility", "lineage")
nextgen.agent
}
所以代理人是这样开始的(Trad = traditional):
ID fertility lineage,
1 Trad 1
2 Trad 2
3 Trad 3
4 Trad 4
5 Trad 5
经过几个时间步(几代)的随机更改后,结果如下:
ID fertility lineage
1 Plan 1
2 Plan 1
3 Trad 2
4 Plan 3
5 Trad 3
6 Trad 4
7 Plan 4
8 Plan 4
9 Plan 4
10 Plan 5
11 Trad 5
确实,用 0 和 1 编码 fertility
会更有效率,你甚至可以有一个整数矩阵。
无论如何,目前的代码可以简化很多 - 所以这是一个矢量化解决方案,仍然使用您的 data.frame
:
NextGen <- function(agent, N, S, A) {
excess <- runif(N)
v1 <- which(agent$fertility == "Trad" & excess < S)
nextgen.agent <- agent[c(1:N, v1), ]
nextgen.agent[c(v1, seq.int(N+1, nrow(nextgen.agent))), "fertility"] <- ifelse(A > runif(length(v1)*2), "Plan", "Trad")
nextgen.agent
}
用样本agent
DF进行测试如下:
agentDF <- data.frame(fertility = "Trad", lineage = 1:50, stringsAsFactors = FALSE)
# use microbenchmark library to compare performance
microbenchmark::microbenchmark(
base = {
res1 <- NextGeneration(agentDF, 50, 0.8, 0.8) # note I fixed the two variable typos in your function
},
new = {
res2 <- NextGen(agentDF, 50, 0.8, 0.8)
},
times = 100
)
## Unit: microseconds
## expr min lq mean median uq max neval
## base 1998.533 2163.8605 2446.561 2222.8200 2286.844 14413.173 100
## new 282.032 304.1165 329.552 320.3255 348.488 467.217 100