Rcpp 中二项式似然的快速评估
Quick evaluation of binomial likelihood in Rcpp
我需要快速评估大量二项式似然。因此,我正在考虑在 Rcpp 中实现它。一种方法如下:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector eval_likelihood(arma::vec Yi,
arma::vec Ni,
arma::vec prob){
// length of vector
int N = prob.n_rows;
// storage for evaluated log likelihoods
NumericVector eval(N);
for(int ii = 0; ii < N; ii++){
int y = Yi(ii); // no. of successes
int n = Ni(ii); // no. of trials
double p = prob(ii); // success probability
eval(ii) = R::dbinom(y,n,p,true); // argument 4 is set to true to return log-likelihood
}
return eval;
}
returns 等价于 dbinom()
在 R 中的对数似然:
Rcpp::sourceCpp("dbinom.cpp") #source Rcpp script
# fake data
Yi = 1:999
Ni = 2:1000
probs = runif(999)
evalR = dbinom(Yi, Ni, probs, log = T) # vectorized solution in R
evalRcpp = eval_likelihood(Yi, Ni, probs) # my Rcpp solution
identical(evalR,evalRcpp)
[1] TRUE
总的来说,这是一个不错的结果。但是,矢量化 R 解决方案平均比我的原始 Rcpp 解决方案稍快:
microbenchmark::microbenchmark(R = dbinom(Yi, Ni, probs, log = T),
Rcpp = eval_likelihood(Yi, Ni, probs))
Unit: microseconds
expr min lq mean median uq max neval cld
R 181.753 182.181 188.7497 182.6090 189.4515 286.100 100 a
Rcpp 178.760 179.615 197.5721 179.8285 184.7470 1397.144 100 a
有人对更快地评估二项式对数似然有一些指导吗?可能是更快的代码或来自概率论的一些 hack。谢谢!
您的实施看起来不错。由于 R 的 dbinom()
已经在高效的 C 代码中实现,您可能不会 显着 改进它。我确实看到了一些可能会产生细微差别的事情(如果你经常这样做,可能会有所帮助):
- 您可以使用
[ii]
而不是 (ii)
来避免边界检查,因为听起来您不必担心这一点(即,这将不是用户调用的函数,它只会在您的 C++ 代码中被调用,大概您的对象是以这样一种方式设置的,这不会成为问题)
- 您可以通过引用而不是通过值传递(请参阅 here)
所以,我添加了你的函数的以下版本:
// [[Rcpp::export]]
NumericVector eval_likelihood2(const arma::vec& Yi,
const arma::vec& Ni,
const arma::vec& prob){
// length of vector
int N = prob.n_rows;
// storage for evaluated log likelihoods
NumericVector eval(N);
for(int ii = 0; ii < N; ii++){
int y = Yi[ii]; // no. of successes
int n = Ni[ii]; // no. of trials
double p = prob[ii]; // success probability
eval[ii] = R::dbinom(y,n,p,1); // argument 4 is set to true to return log-likelihood
}
return eval;
}
你可以看到我刚刚更改了这两件事。
我也使用稍大的数据作为基准测试,不过我也为您原来的较小示例添加了基准测试:
Rcpp::sourceCpp("so.cpp") #source Rcpp script
# fake data
Yi = 1:99999
Ni = 2:100000
probs = runif(99999)
evalR = dbinom(Yi, Ni, probs, log = T) # vectorized solution in R
evalRcpp = eval_likelihood(Yi, Ni, probs) # my Rcpp solution
evalRcpp2 = eval_likelihood(Yi, Ni, probs) # my Rcpp solution
identical(evalR,evalRcpp)
# [1] TRUE
identical(evalR,evalRcpp2)
# [1] TRUE
microbenchmark::microbenchmark(R = dbinom(Yi, Ni, probs, log = T),
Rcpp = eval_likelihood(Yi, Ni, probs),
Rcpp2 = eval_likelihood2(Yi, Ni, probs))
Unit: milliseconds
expr min lq mean median uq max neval
R 7.427669 7.577011 8.565015 7.650762 7.916891 62.63154 100
Rcpp 7.368547 7.858408 8.884823 8.014881 8.353808 63.48417 100
Rcpp2 6.952519 7.256376 7.859609 7.376959 7.829000 12.51065 100
Yi = 1:999
Ni = 2:1000
probs = runif(999)
microbenchmark::microbenchmark(R = dbinom(Yi, Ni, probs, log = T),
Rcpp = eval_likelihood(Yi, Ni, probs),
Rcpp2 = eval_likelihood2(Yi, Ni, probs))
Unit: microseconds
expr min lq mean median uq max neval
R 90.073 100.5035 113.5084 109.5230 122.5260 188.304 100
Rcpp 90.188 97.8565 112.9082 105.2505 122.4255 172.975 100
Rcpp2 86.093 92.0745 103.9474 97.9380 113.2660 148.591 100
我需要快速评估大量二项式似然。因此,我正在考虑在 Rcpp 中实现它。一种方法如下:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector eval_likelihood(arma::vec Yi,
arma::vec Ni,
arma::vec prob){
// length of vector
int N = prob.n_rows;
// storage for evaluated log likelihoods
NumericVector eval(N);
for(int ii = 0; ii < N; ii++){
int y = Yi(ii); // no. of successes
int n = Ni(ii); // no. of trials
double p = prob(ii); // success probability
eval(ii) = R::dbinom(y,n,p,true); // argument 4 is set to true to return log-likelihood
}
return eval;
}
returns 等价于 dbinom()
在 R 中的对数似然:
Rcpp::sourceCpp("dbinom.cpp") #source Rcpp script
# fake data
Yi = 1:999
Ni = 2:1000
probs = runif(999)
evalR = dbinom(Yi, Ni, probs, log = T) # vectorized solution in R
evalRcpp = eval_likelihood(Yi, Ni, probs) # my Rcpp solution
identical(evalR,evalRcpp)
[1] TRUE
总的来说,这是一个不错的结果。但是,矢量化 R 解决方案平均比我的原始 Rcpp 解决方案稍快:
microbenchmark::microbenchmark(R = dbinom(Yi, Ni, probs, log = T),
Rcpp = eval_likelihood(Yi, Ni, probs))
Unit: microseconds
expr min lq mean median uq max neval cld
R 181.753 182.181 188.7497 182.6090 189.4515 286.100 100 a
Rcpp 178.760 179.615 197.5721 179.8285 184.7470 1397.144 100 a
有人对更快地评估二项式对数似然有一些指导吗?可能是更快的代码或来自概率论的一些 hack。谢谢!
您的实施看起来不错。由于 R 的 dbinom()
已经在高效的 C 代码中实现,您可能不会 显着 改进它。我确实看到了一些可能会产生细微差别的事情(如果你经常这样做,可能会有所帮助):
- 您可以使用
[ii]
而不是(ii)
来避免边界检查,因为听起来您不必担心这一点(即,这将不是用户调用的函数,它只会在您的 C++ 代码中被调用,大概您的对象是以这样一种方式设置的,这不会成为问题) - 您可以通过引用而不是通过值传递(请参阅 here)
所以,我添加了你的函数的以下版本:
// [[Rcpp::export]]
NumericVector eval_likelihood2(const arma::vec& Yi,
const arma::vec& Ni,
const arma::vec& prob){
// length of vector
int N = prob.n_rows;
// storage for evaluated log likelihoods
NumericVector eval(N);
for(int ii = 0; ii < N; ii++){
int y = Yi[ii]; // no. of successes
int n = Ni[ii]; // no. of trials
double p = prob[ii]; // success probability
eval[ii] = R::dbinom(y,n,p,1); // argument 4 is set to true to return log-likelihood
}
return eval;
}
你可以看到我刚刚更改了这两件事。
我也使用稍大的数据作为基准测试,不过我也为您原来的较小示例添加了基准测试:
Rcpp::sourceCpp("so.cpp") #source Rcpp script
# fake data
Yi = 1:99999
Ni = 2:100000
probs = runif(99999)
evalR = dbinom(Yi, Ni, probs, log = T) # vectorized solution in R
evalRcpp = eval_likelihood(Yi, Ni, probs) # my Rcpp solution
evalRcpp2 = eval_likelihood(Yi, Ni, probs) # my Rcpp solution
identical(evalR,evalRcpp)
# [1] TRUE
identical(evalR,evalRcpp2)
# [1] TRUE
microbenchmark::microbenchmark(R = dbinom(Yi, Ni, probs, log = T),
Rcpp = eval_likelihood(Yi, Ni, probs),
Rcpp2 = eval_likelihood2(Yi, Ni, probs))
Unit: milliseconds
expr min lq mean median uq max neval
R 7.427669 7.577011 8.565015 7.650762 7.916891 62.63154 100
Rcpp 7.368547 7.858408 8.884823 8.014881 8.353808 63.48417 100
Rcpp2 6.952519 7.256376 7.859609 7.376959 7.829000 12.51065 100
Yi = 1:999
Ni = 2:1000
probs = runif(999)
microbenchmark::microbenchmark(R = dbinom(Yi, Ni, probs, log = T),
Rcpp = eval_likelihood(Yi, Ni, probs),
Rcpp2 = eval_likelihood2(Yi, Ni, probs))
Unit: microseconds
expr min lq mean median uq max neval
R 90.073 100.5035 113.5084 109.5230 122.5260 188.304 100
Rcpp 90.188 97.8565 112.9082 105.2505 122.4255 172.975 100
Rcpp2 86.093 92.0745 103.9474 97.9380 113.2660 148.591 100