如何使 apply() 函数更快?
How to make the apply() function faster?
我有两个矩阵。我想使用第一个的列来过滤第二个,然后找到过滤集的总和。我使用了以下代码,它工作得很好。
apply(firstMat,2,function(x) sum(secondMat[x,x]))
但是,数据集很大,我想找到一种可以加快处理速度的替代方法。
这是小规模的可重现示例:
firstMat<-matrix(c(T,F,T,F,F,T,T,F,F,F),nrow=5,ncol=2)
secondMat<-matrix(c(1,0,0,0,1,0,0,0,1,1,1,0,1,0,1,1,1,0,0,0,1,1,1,0,1),nrow=5,ncol=5)
如果你能帮助我,我将不胜感激。
您可以运行 apply
函数在多个集群上并行运行
firstMat<-matrix(c(T,F,T,F,F,T,T,F,F,F),nrow=5,ncol=2)
secondMat<-matrix(c(1,0,0,0,1,0,0,0,1,1,1,0,1,0,1,1,1,0,0,0,1,1,1,0,1),nrow=5,ncol=5)
# create custers
library(doSNOW)
cl <- makeCluster(2, type = "SOCK") # creates 2 clusters
# can use detectCores() from package parallel to check number of cores in your machine
registerDoSNOW(cl)
clusterExport(cl,list("secondMat")) # need to export secndMAT to each cluster since will be used in cluster
# Option 1: Using parApply from package `parallel`
library(parallel)
parApply(cl,firstMat,2,function(x) sum(secondMat[x,x]))
# Option 2: Using aaply from package `plyr`
library(plyr)
aaply(firstMat,2,function(x) sum(secondMat[x,x]),.parallel=T)
stopCluster(cl)
对于可重现的小例子,它没有显示任何速度改进,但我希望这两个选项都比 apply
对于大矩阵
也许您的 BLAS 比显式循环更快:
diag( t(firstMat) %*% secondMat %*% firstMat )
我有两个矩阵。我想使用第一个的列来过滤第二个,然后找到过滤集的总和。我使用了以下代码,它工作得很好。
apply(firstMat,2,function(x) sum(secondMat[x,x]))
但是,数据集很大,我想找到一种可以加快处理速度的替代方法。
这是小规模的可重现示例:
firstMat<-matrix(c(T,F,T,F,F,T,T,F,F,F),nrow=5,ncol=2)
secondMat<-matrix(c(1,0,0,0,1,0,0,0,1,1,1,0,1,0,1,1,1,0,0,0,1,1,1,0,1),nrow=5,ncol=5)
如果你能帮助我,我将不胜感激。
您可以运行 apply
函数在多个集群上并行运行
firstMat<-matrix(c(T,F,T,F,F,T,T,F,F,F),nrow=5,ncol=2)
secondMat<-matrix(c(1,0,0,0,1,0,0,0,1,1,1,0,1,0,1,1,1,0,0,0,1,1,1,0,1),nrow=5,ncol=5)
# create custers
library(doSNOW)
cl <- makeCluster(2, type = "SOCK") # creates 2 clusters
# can use detectCores() from package parallel to check number of cores in your machine
registerDoSNOW(cl)
clusterExport(cl,list("secondMat")) # need to export secndMAT to each cluster since will be used in cluster
# Option 1: Using parApply from package `parallel`
library(parallel)
parApply(cl,firstMat,2,function(x) sum(secondMat[x,x]))
# Option 2: Using aaply from package `plyr`
library(plyr)
aaply(firstMat,2,function(x) sum(secondMat[x,x]),.parallel=T)
stopCluster(cl)
对于可重现的小例子,它没有显示任何速度改进,但我希望这两个选项都比 apply
对于大矩阵
也许您的 BLAS 比显式循环更快:
diag( t(firstMat) %*% secondMat %*% firstMat )