RcppArmadillo：for 循环中的负索引

Question

我是 Rcpp 的新手，我正在尝试使用 RcppArmadillo 在 for() 循环中基于负索引执行计算。我已经发现 RcppArmadillo 中的负索引并不是那么简单，但它可以通过应该保留的元素向量来完成 (as I found here). That seems a bit difficult to me when the element to remove is the looping index. I tried to implement the last approach in ，但没有成功。有没有一种简单的方法来指定不包括具有循环索引的元素的元素向量？

所以，我试图在 RcppArmadillo 中为 y[-i] 在以下 MWE 中找到等效项：

# In R:
# Input 
n <- 10
y <- 1:20

# Computation
x <- rep(NA, n)
for(i in 1:n){
x[i] <- sum(y[-i])
}

到目前为止我的代码 Rcpp 代码：

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>   

// [[Rcpp::export]]
arma::vec rcpp_sum (arma::vec y, int n){

arma::vec x(n);
arma::vec ind(n);

for (i=0; i<n; i++){
ind[i] = /*no idea...*/
x[i] = sum(y[ind]);
}

return x;
}

非常感谢任何帮助！

Answer 1

对于这样的任务，最好跳过有问题的索引。在标准 C++ 中，我们会检查索引并跳过它。像这样：

// [[Rcpp::export]]
arma::vec rcpp_sum (arma::vec y, int n){
    
    arma::vec x(n);
    
    for (int i = 0; i < n; i++) {
        x[i] = 0; // Initialize value
        
        for (int j = 0; j < y.size(); ++j) {
            if (i != j) {
                x[i] += y[j];
            }
        }
    }
    
    return x;
}

在上面，我们正在远离糖语法。 IMO 在这种情况下没关系，因为替代方案并不过分复杂。虽然我们正在简化，但对 RcppArmadillo 的依赖并不是必需的，因为我们可以只使用纯 Rcpp

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector pure_rcpp_sum (NumericVector y, int n){
    
    NumericVector x(n);
    
    for (int i = 0; i < n; i++) {
        for (int j = 0; j < y.size(); ++j) {
            if (i != j) {
                x[i] += y[j];
            }
        }
    }
    
    return x;
}

验证输出：

all.equal(as.vector(rcpp_sum(y, n)), x)
[1] TRUE

all.equal(pure_rcpp_sum(y, n), x)
[1] TRUE

更新

根据 OP 的要求，我们在基础 R 中针对此特定目的优化了方法。上面演示了如何解决非常具体的问题，即仅对向量中的值求和，而遗漏 C++ 中的一个值。这本质上是教学性的，不一定是完成这项特定任务的最佳方法（正如我们将在下面展示的那样）。

在我们展示简单的 R 代码之前，我想指出 OP 担心在 C++ 的内部循环中有一个简单的条件语句不应该被担心（因为是base R 中的情况）。据我所知，OP 的链接中演示的子集是 O(n) 并且具有额外逻辑向量的额外开销。我们上面提出的解决方案应该更有效，因为它在没有额外对象的情况下基本上做同样的事情。

现在，更新代码：

baseR <- function(y, n) {
    mySum <- sum(y)
    vapply(1:n, function(x) mySum - y[x], FUN.VALUE = 1)    
}

## Here is the OP code for reference
OP <- function(y, n) {
    x <- rep(NA, n)
    for(i in 1:n) {x[i] <- sum(y[-i])}
    x
}

就是这样。它也快如闪电：

huge_y <- rnorm(1e6)
huge_n <- 1e3

system.time(t1 <- baseR(huge_y, huge_n))
 user  system elapsed 
 0.003   0.000   0.003

system.time(t2 <- pure_rcpp_sum(huge_y, huge_n))
 user  system elapsed 
2.776   0.003   2.779 

system.time(t3 <- OP(huge_y, huge_n))
 user  system elapsed 
9.555   1.248  10.805

all.equal(t1, t2)
[1] TRUE

all.equal(t1, t3)
[1] TRUE

RcppArmadillo：for 循环中的负索引

RcppArmadillo: Negative indexing in for-loop

indexing

loops

rcpp

rcpparmadillo

更新