如何为具有并行后端的函数编写 R 包文档

Question

我想把这个函数写成一个R包

编辑

#' create suns package
#''
#' More detailed Description
#'
#' @describeIn This sums helps to
#'
#' @importFrom foreach foreach
#'
#' @importFrom doParallel registerDoParallel
#'
#' @param x Numeric Vector
#'
#' @importFrom doParallel `%dopar%`
#'
#' @importFrom parallel parallel
#'
#' @export
sums <- function(x){
plan(multisession)
n_cores <- detectCores()# check for howmany cores present in the Operating System
cl <- parallel::makeCluster(n_cores)# use all the cores pdectected
doParallel::registerDoParallel(cores  =  detectCores())

    ss <- function(x){
  `%dopar%` <- foreach::`%dopar%`
   foreach::foreach(i = x, .combine = "+") %dopar% {i}
     }
    sss <- function(x){
   `%dopar%` <- foreach::`%dopar%`
   foreach::foreach(i = x, .combine = "+") %dopar% {i^2}
}

ssq <- function(x){
   `%dopar%` <- foreach::`%dopar%`
   foreach::foreach(i = x, .combine = "+") %dopar% {i^3}
}

sums <- function(x, methods = c("sum", "squaredsum", "cubedsum")){

  output <- c()

  if("sum" %in% methods){
    output <- c(output, ss = ss(x))
  }

  if("squaredsum" %in% methods){
    output <- c(output, sss = sss(x))
  }

  if("cubedsum" %in% methods){
    output <- c(output, ssq = ssq(x))
  }

  return(output)
}

parallel::stopCluster(cl = cl)
x <- 1:10

sums(x)

.

我需要什么

假设我的向量 x 非常大，需要进行大约 5 hours 的串行处理才能完成 x <- 1:9e9 之类的任务，并行处理可以提供帮助。我如何包括：

n_cores <- detectCores()
#cl <- makeCluster(n_cores)
#registerDoParallel(cores  =  detectCores())

在我的 .R 文件和 DESCRIPTION 文件中是否值得 R 包文档？

Answer 1

即使不太容易看出问题的范围，我也会尽量提出相关建议。我了解到您在运行使用并行计算 examples/tests 检查您的包裹时遇到问题。

首先，请记住检查使用 CRAN 标准，并且出于兼容性原因，在 CRAN 包中不可能运行使用超过 2 个内核的示例或测试。所以你的例子必须足够简单，2个核心就能处理。
然后你的代码在创建集群时出现问题，但不要在 doParallel
接下来您将在您的代码段中使用并行包和 doParallel 包，因此它们必须包含在描述文件中运行ning 在您的控制台中：

usethis::use_package("parallel")
usethis::use_package("doParallel")

这会将这两个包添加到描述的“导入”部分。然后你不会在你的包中显式加载这些库。

那么您还应该在相关包的名称后面使用“::”来阐明示例中的函数，这将使您的示例看起来像：

    n_cores <- 2
    cl <- parallel::makeCluster(n_cores)
    doParallel::registerDoParallel(cl = cl)
    ...
    parallel::stopCluster(cl = cl)

你也可以参考registerDoParallel的文档得到一段类似的代码，你也会发现它被限制为2核。

为了完整起见，我认为您真的不需要 foreach 包，因为 R 中的默认并行化非常强大。如果您希望能够将您的函数与 detectCores 一起使用，我建议您添加一个 limitint 参数。这个函数应该以更“R 类”的方式做你想做的事：

sums <- function(x, methods, maxcores) {
  n_cores <- min(maxcores,
                 parallel::detectCores())# check for howmany cores present in the Operating System
  cl <- parallel::makeCluster(n_cores)# use all the cores pdectected
  
  outputs <- sapply(
    X = methods,
    FUN = function(method) {
      if ("sum" == method) {
        output <- parallel::parSapply(
          cl = cl,
          X = x,
          FUN = function(i)
            i
        )
      }
      
      if ("squaredsum" == method) {
        output <-
          parallel::parSapply(
            cl = cl,
            X = x,
            FUN = function(i)
              i ** 2
          )
      }
      
      if ("cubedsum" == method) {
        output <-
          parallel::parSapply(
            cl = cl,
            X = x,
            FUN = function(i)
              i ** 3
          )
      }
      
      return(sum(output))
    }
  )
  
  parallel::stopCluster(cl = cl)
  
  return(outputs)
}


x <- 1:10000000

sums(x = x, c("sum", "squaredsum"), 2)

如何为具有并行后端的函数编写 R 包文档

How to Write R Package Documentation for a Function with Parallel Backend

foreach

r

package-development

r-package

doparallel