allowed/possible 在 Rcpp 的 pragma openmp parallel for 循环中调用 R 函数或 fortran 代码吗?

Is it allowed/possible to call an R function or fortran code within a pragma openmp parallel for loop in Rcpp?

在 Rcpp 项目中,我希望能够 call an R function (the cobs function from the cobs package to do a concave spline fit) or call the fortran code that it relies on (the cobs function uses quantreg's rq.fit.sfnc function to fit the constrained spline model, which in turn relies on the fortran coded srqfnc function in the quantreg package) within a pragma openmp parallel for loop(我的其余代码主要需要一些简单的线性代数,所以这没问题,但遗憾的是每个内循环迭代也需要我做一个凹样条拟合)。我想知道这是否被允许或可能,因为我认为这样的调用不是线程安全的?是否有一个简单的解决方法,比如用 #pragma omp critical 包围这些调用?有人有这方面的例子吗?或者在这种情况下,唯一的方法是首先使用线程安全的 Armadillo 类 对 cobsrq.fit.sfnc 函数进行完整的 Rcpp 移植?

引用 the manual:

Calling any of the R API from threaded code is ‘for experts only’ and strongly discouraged. Many functions in the R API modify internal R data structures and might corrupt these data structures if called simultaneously from multiple threads. Most R API functions can signal errors, which must only happen on the R main thread. Also, external libraries (e.g. LAPACK) may not be thread-safe.

我一直把这个解读为"one must not call an R API function from threaded code"。从 omp 并行区域内调用 R 函数,而不管内部使用的是什么。使用 #pragma omp critical 可能 有效,但如果它坏了,你必须保留碎片 ...

重新实现有问题的代码或在 C++ 中寻找现有实现并直接调用它会更安全。

所以我刚刚尝试过,似乎只有在 #pragma openmp parallel for 循环之前调用 R 函数才有效(否则会导致堆栈不平衡,并使 R 崩溃)。当然这会导致那部分代码顺序执行,但这在某些情况下可能仍然有用。

示例:

Rcpp部分,另存为文件"fitMbycol.cpp":

// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
// #define RCPP_ARMADILLO_RETURN_COLVEC_AS_VECTOR
using namespace Rcpp;
using namespace arma;
using namespace std;

#include <omp.h>
// [[Rcpp::plugins(openmp)]]

// [[Rcpp::export]]
arma::mat fitMbycol(arma::mat& M, Rcpp::Function f, const int nthreads) {

  // ARGUMENTS
  // M: matrix for which we want to fit given function f over each column
  // f: fitting function to use with one single argument (vector y) that returns the fitted values as a vector
  // nthreads: number of threads to use

  // we apply fitting function over columns
  int c = M.n_cols;
  int r = M.n_rows;
  arma::mat out(r,c);
  int i;
  omp_set_num_threads(nthreads);
#pragma omp parallel for shared(out)
  for (i = 0; i < c; i++) {
      arma::vec y = M.col(i); // ith column of M
#pragma omp critical
{
      out.col(i) = as<arma::colvec>(f(NumericVector(y.begin(),y.end())));
}
  }

  return out;

}

然后在 R 中:

首先是纯 R 版本:

(我们用泊松噪声模拟一些高斯峰形,然后使用 cobs 函数对它们进行对数凹样条拟合)

x=1:100
n=length(x)
ncols=50
gauspeak=function(x, u, w, h=1) h*exp(((x-u)^2)/(-2*(w^2)))
Y_nonoise=do.call(cbind,lapply(seq(min(x), max(x), length.out=ncols), function (u) gauspeak(x, u=u, w=10, h=u*100)))
set.seed(123)
Y=apply(Y_nonoise, 2, function (col) rpois(n,col))

# log-concave spline fit on each column of matrix Y using cobs
require(cobs)
logconcobs = function(y, tau=0.5, nknots=length(y)/10) {
  x = 1:length(y)
  offs = max(y)*1E-6
  weights = y^(1/2)
  fit.y = suppressWarnings(cobs(x=x,y=log10(y+offs), 
              constraint = "concave", lambda=0, 
              knots = seq(min(x),max(x), length.out = nknots), 
              nknots=nknots, knots.add = FALSE, repeat.delete.add = FALSE,
              keep.data = FALSE, keep.x.ps = TRUE,
              w=weights, 
              tau=tau, print.warn = F, print.mesg = F, rq.tol = 0.1, maxiter = 100)$fitted)
  return(pmax(10^fit.y - offs, 0))
}
library(microbenchmark)
microbenchmark(Y.fitted <- apply(Y, 2, function(col) logconcobs(y=col, tau=0.5)),times=5L) # 363 ms, ie 363/50=7 ms per fit
matplot(Y,type="l",lty=1)
matplot(Y_nonoise,type="l",add=TRUE, lwd=3, col=adjustcolor("blue",alpha.f=0.2),lty=1)
matplot(Y.fitted,type="l",add=TRUE, lwd=3, col=adjustcolor("red",alpha.f=0.2),lty=1)

现在使用 Rcpp#pragma openmp parallel for loop 中调用我们的 R 拟合函数 logconcobs,并附上 #pragma omp critical:

library(Rcpp)
library(RcppArmadillo)
Rcpp::sourceCpp('fitMbycol.cpp')
microbenchmark(Y.fitted <- fitMbycol(Y, function (y) logconcobs(y, tau=0.5, nknots=10), nthreads=8L ), times=5L) # 361 ms

OpenMP 当然最终在这种情况下没有任何效果,因为 #pragma omp critical 导致所有内容按顺序执行,但在更复杂的示例中这仍然有用。