如何在 R 中嵌套 foreach 循环的内循环和外循环之间添加代码
How can I add code between the inner and outer loops of nested foreach loops in R
我了解到在 R 中执行嵌套 foreach 循环的正确方法是通过嵌套运算符 %:%
(例如 https://cran.r-project.org/web/packages/foreach/vignettes/nested.html)。
但是,使用这种方法时,不能在内部循环和外部循环之间添加代码 -- 请参见下面的示例。
有没有办法创建嵌套的并行 foreach 循环,以便可以在内部循环和外部循环之间添加代码?
更一般地说,想到的明显方法有什么问题吗,即简单地使用 %dopar%
运算符而不是 %:%
运算符嵌套两个 foreach 循环?请参阅下面的简单示例。
library(foreach)
# Set up backend
cl = makeCluster(6)
registerDoParallel(cl)
on.exit(stopCluster(cl))
# Run nested loop with '%:%' operator. Breaks if adding code between the inner and outer loops
foreach(i=1:2) %:%
# a = 1 #trivial example of running code between outer and inner loop -- throws error
foreach(j = 1:3) %dopar% {
i * j
}
# Run nested loop using 2 '%dopar%' statements -- is there anything wrong with this?
foreach(i=1:2, .packages = 'foreach') %dopar% {
a = 1 #trivial example of running code between outer and inner loop
foreach(j = 1:3) %dopar% {
i * j
}
}
您 provided 文档中的“将 %:%
与 %dopar%
结合使用”一章提供了有用的提示:
all of the tasks are completely independent of each other, and so they can all be executed in parallel
The %:%
operator turns multiple foreach loops into a single loop. That is why there is only one %do%
operator in the example above. And when we parallelize that nested foreach loop by changing the %do%
into a %dopar%
, we are creating a single stream of tasks that can all be executed in parallel.
当您组合两个 %dopar%
并测量执行时间时,您会看到只有外循环是并行执行的,这可能不是您要找的:
system.time(
foreach(i=1:2, .packages = 'foreach') %dopar% {
# Outer calculation
Sys.sleep(.5)
foreach(j = 1:3) %dopar% {
# Inner calculation
Sys.sleep(1)
}
})
# user system elapsed
# 0.00 0.00 3.52
经过的时间反映了:
parallel[ outer(0.5s) + sequential [3 * inner(1s)] ] ~ 3.5s
如果外层的计算不是太长的话,放到内层循环其实会更快,因为用了你例子的6个worker:
system.time(res <- foreach(i=1:2, .packages = 'foreach') %:%
foreach(j = 1:3) %dopar% {
# Outer calculation
Sys.sleep(.5)
# Inner calculation
Sys.sleep(1)
})
# user system elapsed
# 0.02 0.02 1.52
如果外层计算太长并且内层循环比外层循环多得多,则可以并行预先计算外层循环。然后,您可以在 %:%
:
中使用结果
system.time({
precalc <- foreach(i=1:2) %dopar% {
# Outer pre-calculation
Sys.sleep(2)
i
}
foreach(i=1:2, .packages = 'foreach') %:%
foreach(j = 1:12) %dopar% {
# Inner calculation
Sys.sleep(1)
precalc[[i]]*j
}
})
# user system elapsed
# 0.11 0.00 5.25
比 :
快
system.time({
foreach(i=1:2, .packages = 'foreach') %:%
foreach(j = 1:12) %dopar% {
# Outer calculation
Sys.sleep(2)
# Inner calculation
Sys.sleep(1)
i*j
}
})
# user system elapsed
# 0.13 0.00 9.21
我了解到在 R 中执行嵌套 foreach 循环的正确方法是通过嵌套运算符 %:%
(例如 https://cran.r-project.org/web/packages/foreach/vignettes/nested.html)。
但是,使用这种方法时,不能在内部循环和外部循环之间添加代码 -- 请参见下面的示例。
有没有办法创建嵌套的并行 foreach 循环,以便可以在内部循环和外部循环之间添加代码?
更一般地说,想到的明显方法有什么问题吗,即简单地使用 %dopar%
运算符而不是 %:%
运算符嵌套两个 foreach 循环?请参阅下面的简单示例。
library(foreach)
# Set up backend
cl = makeCluster(6)
registerDoParallel(cl)
on.exit(stopCluster(cl))
# Run nested loop with '%:%' operator. Breaks if adding code between the inner and outer loops
foreach(i=1:2) %:%
# a = 1 #trivial example of running code between outer and inner loop -- throws error
foreach(j = 1:3) %dopar% {
i * j
}
# Run nested loop using 2 '%dopar%' statements -- is there anything wrong with this?
foreach(i=1:2, .packages = 'foreach') %dopar% {
a = 1 #trivial example of running code between outer and inner loop
foreach(j = 1:3) %dopar% {
i * j
}
}
您 provided 文档中的“将 %:%
与 %dopar%
结合使用”一章提供了有用的提示:
all of the tasks are completely independent of each other, and so they can all be executed in parallel
The
%:%
operator turns multiple foreach loops into a single loop. That is why there is only one%do%
operator in the example above. And when we parallelize that nested foreach loop by changing the%do%
into a%dopar%
, we are creating a single stream of tasks that can all be executed in parallel.
当您组合两个 %dopar%
并测量执行时间时,您会看到只有外循环是并行执行的,这可能不是您要找的:
system.time(
foreach(i=1:2, .packages = 'foreach') %dopar% {
# Outer calculation
Sys.sleep(.5)
foreach(j = 1:3) %dopar% {
# Inner calculation
Sys.sleep(1)
}
})
# user system elapsed
# 0.00 0.00 3.52
经过的时间反映了:
parallel[ outer(0.5s) + sequential [3 * inner(1s)] ] ~ 3.5s
如果外层的计算不是太长的话,放到内层循环其实会更快,因为用了你例子的6个worker:
system.time(res <- foreach(i=1:2, .packages = 'foreach') %:%
foreach(j = 1:3) %dopar% {
# Outer calculation
Sys.sleep(.5)
# Inner calculation
Sys.sleep(1)
})
# user system elapsed
# 0.02 0.02 1.52
如果外层计算太长并且内层循环比外层循环多得多,则可以并行预先计算外层循环。然后,您可以在 %:%
:
system.time({
precalc <- foreach(i=1:2) %dopar% {
# Outer pre-calculation
Sys.sleep(2)
i
}
foreach(i=1:2, .packages = 'foreach') %:%
foreach(j = 1:12) %dopar% {
# Inner calculation
Sys.sleep(1)
precalc[[i]]*j
}
})
# user system elapsed
# 0.11 0.00 5.25
比 :
快system.time({
foreach(i=1:2, .packages = 'foreach') %:%
foreach(j = 1:12) %dopar% {
# Outer calculation
Sys.sleep(2)
# Inner calculation
Sys.sleep(1)
i*j
}
})
# user system elapsed
# 0.13 0.00 9.21