如何使 apply() 函数更快？

Question

我有两个矩阵。我想使用第一个的列来过滤第二个，然后找到过滤集的总和。我使用了以下代码，它工作得很好。

apply(firstMat,2,function(x) sum(secondMat[x,x]))

但是，数据集很大，我想找到一种可以加快处理速度的替代方法。

这是小规模的可重现示例：

firstMat<-matrix(c(T,F,T,F,F,T,T,F,F,F),nrow=5,ncol=2)
secondMat<-matrix(c(1,0,0,0,1,0,0,0,1,1,1,0,1,0,1,1,1,0,0,0,1,1,1,0,1),nrow=5,ncol=5)

如果你能帮助我，我将不胜感激。

Answer 1

您可以运行 apply 函数在多个集群上并行运行

firstMat<-matrix(c(T,F,T,F,F,T,T,F,F,F),nrow=5,ncol=2)
secondMat<-matrix(c(1,0,0,0,1,0,0,0,1,1,1,0,1,0,1,1,1,0,0,0,1,1,1,0,1),nrow=5,ncol=5)

# create custers
library(doSNOW)
cl <- makeCluster(2, type = "SOCK") # creates 2 clusters 
# can use detectCores() from package parallel to check number of cores in your machine
registerDoSNOW(cl)
clusterExport(cl,list("secondMat")) # need to export secndMAT to each cluster since will be used in cluster

# Option 1: Using parApply from package `parallel`
library(parallel)
parApply(cl,firstMat,2,function(x) sum(secondMat[x,x]))

# Option 2: Using aaply from package `plyr`
library(plyr)    
aaply(firstMat,2,function(x) sum(secondMat[x,x]),.parallel=T)

stopCluster(cl)

对于可重现的小例子，它没有显示任何速度改进，但我希望这两个选项都比 apply 对于大矩阵

Answer 2

也许您的 BLAS 比显式循环更快：

diag( t(firstMat) %*% secondMat %*% firstMat )

如何使 apply() 函数更快？

How to make the apply() function faster?

r

apply