使用 Rcpp 的 R 快速 cbind 矩阵
R fast cbind matrix using Rcpp
cbind
在R中重复调用比较耗时,但对各种数据类型也很强大。
在绑定两个矩阵时,我编写的代码比 cbind
快 3 倍。但是 dplyr
包中的 bind_cols
仅比 cbind
快 100 倍。唯一遗憾的是它不能将矩阵作为输入。有人可以使下面的代码更快。另外,如何快速绑定稀疏矩阵?这是我使用的代码:
require( Rcpp )
func <- 'NumericMatrix mmult(NumericMatrix a,NumericMatrix b) {
//the colnumber of first matrix
int acoln=a.ncol();
//the colnumber of second matrix
int bcoln=b.ncol();
//build a new matrix, the dim is a.nrow() and acoln+bcoln
NumericMatrix out(a.nrow(),acoln+bcoln) ;
for (int j = 0; j < acoln + bcoln; j++) {
if (j < acoln) {
out(_,j) = a(_,j);
} else {
//put the context in the second matrix to the new matrix
out(_,j) = b(_,j-acoln);
}
}
return out ;
}'
a <- matrix(rep(1,2000*100),2000)
b <- matrix(rep(2,2000*10),2000)
cppFunction(func)
system.time(for (i in seq(1,800)) {mmult(a,b)})
system.time(for (i in seq(1,800)) {cbind(a,b)})
identical(mmult(a,b),cbind(a,b))
从 Romain Francois 的 借用我之前的 Rcpp 冒险之一的想法,
func1 <- 'NumericMatrix mmult1(NumericMatrix a, NumericMatrix b) {
int acoln = a.ncol();
int bcoln = b.ncol();
NumericMatrix out = no_init_matrix(a.nrow(), acoln + bcoln);
for (int j = 0; j < acoln + bcoln; j++) {
if (j < acoln) {
out(_, j) = a(_, j);
} else {
out(_, j) = b(_, j - acoln);
}
}
return out;
}'
cppFunction(func1)
set.seed(42)
a <- matrix(rnorm(1e7), 1e3)
b <- matrix(runif(1e7), 1e3)
identical(mmult(a, b), mmult1(a, b))
#TRUE
library(microbenchmark)
microbenchmark(mmult(a, b),
mmult1(a, b),
cbind(a, b),
times = 10)
#Unit: milliseconds
# expr min lq mean median uq max neval
# mmult(a, b) 69.64 70.52 89.71 72.28 128.8 136.6 10
# mmult1(a, b) 50.84 50.95 69.65 51.43 111.6 114.4 10
# cbind(a, b) 192.35 194.67 201.13 195.30 196.1 255.9 10
没什么大不了的,但对于这样一个微不足道的变化来说也不错。
cbind
在R中重复调用比较耗时,但对各种数据类型也很强大。
在绑定两个矩阵时,我编写的代码比 cbind
快 3 倍。但是 dplyr
包中的 bind_cols
仅比 cbind
快 100 倍。唯一遗憾的是它不能将矩阵作为输入。有人可以使下面的代码更快。另外,如何快速绑定稀疏矩阵?这是我使用的代码:
require( Rcpp )
func <- 'NumericMatrix mmult(NumericMatrix a,NumericMatrix b) {
//the colnumber of first matrix
int acoln=a.ncol();
//the colnumber of second matrix
int bcoln=b.ncol();
//build a new matrix, the dim is a.nrow() and acoln+bcoln
NumericMatrix out(a.nrow(),acoln+bcoln) ;
for (int j = 0; j < acoln + bcoln; j++) {
if (j < acoln) {
out(_,j) = a(_,j);
} else {
//put the context in the second matrix to the new matrix
out(_,j) = b(_,j-acoln);
}
}
return out ;
}'
a <- matrix(rep(1,2000*100),2000)
b <- matrix(rep(2,2000*10),2000)
cppFunction(func)
system.time(for (i in seq(1,800)) {mmult(a,b)})
system.time(for (i in seq(1,800)) {cbind(a,b)})
identical(mmult(a,b),cbind(a,b))
从 Romain Francois 的
func1 <- 'NumericMatrix mmult1(NumericMatrix a, NumericMatrix b) {
int acoln = a.ncol();
int bcoln = b.ncol();
NumericMatrix out = no_init_matrix(a.nrow(), acoln + bcoln);
for (int j = 0; j < acoln + bcoln; j++) {
if (j < acoln) {
out(_, j) = a(_, j);
} else {
out(_, j) = b(_, j - acoln);
}
}
return out;
}'
cppFunction(func1)
set.seed(42)
a <- matrix(rnorm(1e7), 1e3)
b <- matrix(runif(1e7), 1e3)
identical(mmult(a, b), mmult1(a, b))
#TRUE
library(microbenchmark)
microbenchmark(mmult(a, b),
mmult1(a, b),
cbind(a, b),
times = 10)
#Unit: milliseconds
# expr min lq mean median uq max neval
# mmult(a, b) 69.64 70.52 89.71 72.28 128.8 136.6 10
# mmult1(a, b) 50.84 50.95 69.65 51.43 111.6 114.4 10
# cbind(a, b) 192.35 194.67 201.13 195.30 196.1 255.9 10
没什么大不了的,但对于这样一个微不足道的变化来说也不错。