Rcpp 与 R 函数。我的代码中有错误?

Rcpp vs R function. bug in my code?

为什么我从这两个函数得到不同的结果? math.h::pow() 有什么我做错的吗?还是我需要使用与 cmath::abs() 不同的 abs() 函数?

library(Rcpp)
sourceCpp(...) # below C++ function

#include <Rcpp.h>
#include <cmath>
#include <math.h>
#include <iostream>
using namespace Rcpp;

// [[Rcpp::export]]
double dist_q (NumericVector& x, NumericVector& y, int& q) {
  int nx= x.size(), ny = y.size();

  if (nx != ny) {
    std::cout << "ERROR: Length of x and y differ." << std::endl;
    return -1;
  }

  double temp = 0.0;
  int m = 0;
  for (int i = 0; i < nx; i++) {
    if (!NumericVector::is_na(x[i]) && !NumericVector::is_na(y[i])) {
      ++m;
      temp += pow(abs(x[i] - y[i]), (double) q);
    }
  }
  temp = (1 / (double) m * temp);
  return pow(temp, (1 / (double) q));
}

#--------------------------------------------------
# R function
dist_qR <- function(x, y, q= 2) {
  if (!is.numeric(x) | !is.numeric(y)) stop("Both x and y must be numeric.")
  if (q < 1 | q %% 1 != 0) stop("q must be an integer >= 1")

  m <- sum(!is.na(x) & !is.na(y))

  return((1 / m * sum(abs(x - y)^q, na.rm=TRUE))^(1/q))
}


# test
set.seed(1415)
x <- rnorm(1000)
y <- rnorm(1000)
x2 <- x; x2[x2>1.5] <- NA
y2 <- y; y2[y2 > 1.5] <- NA

> dist_q(x,y,2)
[1] 1.089495
> dist_qR(x,y,2)
[1] 1.438455
> dist_q(x2,y2,2)
[1] 0.9119293
> dist_qR(x2,y2,2)
[1] 1.249269

回复:@Konrad -- R> ?"&&":

& and && indicate logical AND and | and || indicate logical OR. The shorter form performs elementwise comparisons in much the same way as arithmetic operators. The longer form evaluates left to right examining only the first element of each vector. Evaluation proceeds only until the result is determined. The longer form is appropriate for programming control-flow and typically preferred in if clauses.

时机

使用 fabs 如下:

x <- list(x= rnorm(100),
          x2= rnorm(1000),
          x3= rnorm(10000),
          x4= rnorm(100000))

y <- list(x= rnorm(100),
          x2= rnorm(1000),
          x3= rnorm(10000),
          x4= rnorm(100000))


x2 <- lapply(x, function(l) {l[l>1.5] <- NA; return(l)})
y2 <- lapply(y, function(l) {l[l>1.5] <- NA; return(l)})

library(microbenchmark)

microbenchmark(n100_r= dist_qR(x[[1]], y[[1]], 3),
               n1000_r= dist_qR(x[[2]], y[[2]], 3),
               n10000_r= dist_qR(x[[3]], y[[3]], 3),
               n100000_r= dist_qR(x[[4]], y[[4]], 3),
               n100_c= dist_q(x[[1]], y[[1]], 3),
               n1000_c= dist_q(x[[2]], y[[2]], 3),
               n10000_c= dist_q(x[[3]], y[[3]], 3),
               n100000_c= dist_q(x[[4]], y[[4]], 3), times= 50)

Unit: microseconds
      expr       min        lq        mean     median        uq       max neval  cld
    n100_r    33.431    42.521    72.43820    83.8690    95.599   120.525    50 a   
   n1000_r   125.803   133.720   163.82538   167.1510   189.731   208.792    50 a   
  n10000_r   954.812  1061.260  1190.66434  1086.6260  1131.053  3577.318    50  b  
 n100000_r 10169.212 10806.144 11702.11890 11041.3280 12827.495 15562.607    50    d
    n100_c    12.317    14.662    20.83844    20.5270    26.099    39.002    50 a   
   n1000_c    91.200    97.651   104.58960   105.1295   109.674   119.938    50 a   
  n10000_c   875.928   939.270   953.09926   951.4395   970.060  1015.807    50  b  
 n100000_c  8272.492  9227.011  9472.04164  9339.7640  9531.108 12799.929    50   c 

# missing values not shown, but I get roughly the same timing

看起来大向量的加速是最小的——可能是因为 R 中的数学函数是在 C 中实现的。但短向量的加速是显着的。我猜这可能是由于我在 R 中用于类型检查的开销。

看起来 abs 由于某种原因被 numericVector class 弄糊涂了。改成C版fabs解决问题:

    ...
    temp += pow(fabs(x[i] - y[i]), (double) q);  // fabs instead of abs
    ...

> dist_qR(x,y,2)
[1] 1.438455
> dist_q(x,y,2)
[1] 1.438455
> dist_q(x2,y2,2)
[1] 1.249269
> dist_qR(x2,y2,2)
[1] 1.249269