Rcpp 与 R 函数。我的代码中有错误?
Rcpp vs R function. bug in my code?
为什么我从这两个函数得到不同的结果? math.h::pow()
有什么我做错的吗?还是我需要使用与 cmath::abs()
不同的 abs()
函数?
library(Rcpp)
sourceCpp(...) # below C++ function
#include <Rcpp.h>
#include <cmath>
#include <math.h>
#include <iostream>
using namespace Rcpp;
// [[Rcpp::export]]
double dist_q (NumericVector& x, NumericVector& y, int& q) {
int nx= x.size(), ny = y.size();
if (nx != ny) {
std::cout << "ERROR: Length of x and y differ." << std::endl;
return -1;
}
double temp = 0.0;
int m = 0;
for (int i = 0; i < nx; i++) {
if (!NumericVector::is_na(x[i]) && !NumericVector::is_na(y[i])) {
++m;
temp += pow(abs(x[i] - y[i]), (double) q);
}
}
temp = (1 / (double) m * temp);
return pow(temp, (1 / (double) q));
}
#--------------------------------------------------
# R function
dist_qR <- function(x, y, q= 2) {
if (!is.numeric(x) | !is.numeric(y)) stop("Both x and y must be numeric.")
if (q < 1 | q %% 1 != 0) stop("q must be an integer >= 1")
m <- sum(!is.na(x) & !is.na(y))
return((1 / m * sum(abs(x - y)^q, na.rm=TRUE))^(1/q))
}
# test
set.seed(1415)
x <- rnorm(1000)
y <- rnorm(1000)
x2 <- x; x2[x2>1.5] <- NA
y2 <- y; y2[y2 > 1.5] <- NA
> dist_q(x,y,2)
[1] 1.089495
> dist_qR(x,y,2)
[1] 1.438455
> dist_q(x2,y2,2)
[1] 0.9119293
> dist_qR(x2,y2,2)
[1] 1.249269
回复:@Konrad -- R> ?"&&"
:
& and && indicate logical AND and | and || indicate logical OR. The shorter form performs elementwise comparisons in much the same way as arithmetic operators. The longer form evaluates left to right examining only the first element of each vector. Evaluation proceeds only until the result is determined. The longer form is appropriate for programming control-flow and typically preferred in if clauses.
时机
使用 fabs
如下:
x <- list(x= rnorm(100),
x2= rnorm(1000),
x3= rnorm(10000),
x4= rnorm(100000))
y <- list(x= rnorm(100),
x2= rnorm(1000),
x3= rnorm(10000),
x4= rnorm(100000))
x2 <- lapply(x, function(l) {l[l>1.5] <- NA; return(l)})
y2 <- lapply(y, function(l) {l[l>1.5] <- NA; return(l)})
library(microbenchmark)
microbenchmark(n100_r= dist_qR(x[[1]], y[[1]], 3),
n1000_r= dist_qR(x[[2]], y[[2]], 3),
n10000_r= dist_qR(x[[3]], y[[3]], 3),
n100000_r= dist_qR(x[[4]], y[[4]], 3),
n100_c= dist_q(x[[1]], y[[1]], 3),
n1000_c= dist_q(x[[2]], y[[2]], 3),
n10000_c= dist_q(x[[3]], y[[3]], 3),
n100000_c= dist_q(x[[4]], y[[4]], 3), times= 50)
Unit: microseconds
expr min lq mean median uq max neval cld
n100_r 33.431 42.521 72.43820 83.8690 95.599 120.525 50 a
n1000_r 125.803 133.720 163.82538 167.1510 189.731 208.792 50 a
n10000_r 954.812 1061.260 1190.66434 1086.6260 1131.053 3577.318 50 b
n100000_r 10169.212 10806.144 11702.11890 11041.3280 12827.495 15562.607 50 d
n100_c 12.317 14.662 20.83844 20.5270 26.099 39.002 50 a
n1000_c 91.200 97.651 104.58960 105.1295 109.674 119.938 50 a
n10000_c 875.928 939.270 953.09926 951.4395 970.060 1015.807 50 b
n100000_c 8272.492 9227.011 9472.04164 9339.7640 9531.108 12799.929 50 c
# missing values not shown, but I get roughly the same timing
看起来大向量的加速是最小的——可能是因为 R 中的数学函数是在 C 中实现的。但短向量的加速是显着的。我猜这可能是由于我在 R 中用于类型检查的开销。
看起来 abs
由于某种原因被 numericVector
class 弄糊涂了。改成C版fabs
解决问题:
...
temp += pow(fabs(x[i] - y[i]), (double) q); // fabs instead of abs
...
> dist_qR(x,y,2)
[1] 1.438455
> dist_q(x,y,2)
[1] 1.438455
> dist_q(x2,y2,2)
[1] 1.249269
> dist_qR(x2,y2,2)
[1] 1.249269
为什么我从这两个函数得到不同的结果? math.h::pow()
有什么我做错的吗?还是我需要使用与 cmath::abs()
不同的 abs()
函数?
library(Rcpp)
sourceCpp(...) # below C++ function
#include <Rcpp.h>
#include <cmath>
#include <math.h>
#include <iostream>
using namespace Rcpp;
// [[Rcpp::export]]
double dist_q (NumericVector& x, NumericVector& y, int& q) {
int nx= x.size(), ny = y.size();
if (nx != ny) {
std::cout << "ERROR: Length of x and y differ." << std::endl;
return -1;
}
double temp = 0.0;
int m = 0;
for (int i = 0; i < nx; i++) {
if (!NumericVector::is_na(x[i]) && !NumericVector::is_na(y[i])) {
++m;
temp += pow(abs(x[i] - y[i]), (double) q);
}
}
temp = (1 / (double) m * temp);
return pow(temp, (1 / (double) q));
}
#--------------------------------------------------
# R function
dist_qR <- function(x, y, q= 2) {
if (!is.numeric(x) | !is.numeric(y)) stop("Both x and y must be numeric.")
if (q < 1 | q %% 1 != 0) stop("q must be an integer >= 1")
m <- sum(!is.na(x) & !is.na(y))
return((1 / m * sum(abs(x - y)^q, na.rm=TRUE))^(1/q))
}
# test
set.seed(1415)
x <- rnorm(1000)
y <- rnorm(1000)
x2 <- x; x2[x2>1.5] <- NA
y2 <- y; y2[y2 > 1.5] <- NA
> dist_q(x,y,2)
[1] 1.089495
> dist_qR(x,y,2)
[1] 1.438455
> dist_q(x2,y2,2)
[1] 0.9119293
> dist_qR(x2,y2,2)
[1] 1.249269
回复:@Konrad -- R> ?"&&"
:
& and && indicate logical AND and | and || indicate logical OR. The shorter form performs elementwise comparisons in much the same way as arithmetic operators. The longer form evaluates left to right examining only the first element of each vector. Evaluation proceeds only until the result is determined. The longer form is appropriate for programming control-flow and typically preferred in if clauses.
时机
使用 fabs
如下:
x <- list(x= rnorm(100),
x2= rnorm(1000),
x3= rnorm(10000),
x4= rnorm(100000))
y <- list(x= rnorm(100),
x2= rnorm(1000),
x3= rnorm(10000),
x4= rnorm(100000))
x2 <- lapply(x, function(l) {l[l>1.5] <- NA; return(l)})
y2 <- lapply(y, function(l) {l[l>1.5] <- NA; return(l)})
library(microbenchmark)
microbenchmark(n100_r= dist_qR(x[[1]], y[[1]], 3),
n1000_r= dist_qR(x[[2]], y[[2]], 3),
n10000_r= dist_qR(x[[3]], y[[3]], 3),
n100000_r= dist_qR(x[[4]], y[[4]], 3),
n100_c= dist_q(x[[1]], y[[1]], 3),
n1000_c= dist_q(x[[2]], y[[2]], 3),
n10000_c= dist_q(x[[3]], y[[3]], 3),
n100000_c= dist_q(x[[4]], y[[4]], 3), times= 50)
Unit: microseconds
expr min lq mean median uq max neval cld
n100_r 33.431 42.521 72.43820 83.8690 95.599 120.525 50 a
n1000_r 125.803 133.720 163.82538 167.1510 189.731 208.792 50 a
n10000_r 954.812 1061.260 1190.66434 1086.6260 1131.053 3577.318 50 b
n100000_r 10169.212 10806.144 11702.11890 11041.3280 12827.495 15562.607 50 d
n100_c 12.317 14.662 20.83844 20.5270 26.099 39.002 50 a
n1000_c 91.200 97.651 104.58960 105.1295 109.674 119.938 50 a
n10000_c 875.928 939.270 953.09926 951.4395 970.060 1015.807 50 b
n100000_c 8272.492 9227.011 9472.04164 9339.7640 9531.108 12799.929 50 c
# missing values not shown, but I get roughly the same timing
看起来大向量的加速是最小的——可能是因为 R 中的数学函数是在 C 中实现的。但短向量的加速是显着的。我猜这可能是由于我在 R 中用于类型检查的开销。
看起来 abs
由于某种原因被 numericVector
class 弄糊涂了。改成C版fabs
解决问题:
...
temp += pow(fabs(x[i] - y[i]), (double) q); // fabs instead of abs
...
> dist_qR(x,y,2)
[1] 1.438455
> dist_q(x,y,2)
[1] 1.438455
> dist_q(x2,y2,2)
[1] 1.249269
> dist_qR(x2,y2,2)
[1] 1.249269