循环遍历 A 中所有行并关联 B 中所有列的智能方法

Smart way to loop over all rows in A and correlate with all columns in B

首先post,所以温柔点;-)

我有一个场景,我想将垫子 A 的所有行(大约 50,000)与垫子 B 的所有列(大约 100)相关联。我已经通过这样做解决了这个问题:

output = c()
for( i in 1:nrow(A) ){
    for(j in 1:ncol(B) ){
        myTest = cor.test(A[i,],B[,j],method="spearman")
        output = rbind(output,c(rownames(A)[i],colnames(B)[j],
                                myTest$p.value,myTest$estimate))
    }
}

但是慢得无可救药,运行了30个小时,还是没写完。

一定有更聪明的方法吗? :-)

干杯!

你的代码很慢主要是因为你做 rbind,它创建了一个新矩阵并复制了前一个矩阵的所有数据。这会产生巨大的开销。

一个简单的解决方案是在循环之前创建矩阵,然后填充它:

output = matrix(0, nrow=nrow(A)*ncol(B), ncol=4)
for(i in 1:nrow(A)){
    for(j in 1:ncol(B) ){
        myTest = cor.test(A[i,],B[,j],method="spearman")
        output[(i-1)*ncol(B)+j,] = c(rownames(A)[i],colnames(B)[j],
                                myTest$p.value,myTest$estimate)
    }
}

好的,所以我尝试了@Math 的建议并决定使用以下代码计时:

# Clear workspace
rm(list = ls())

# Reproducible results
set.seed(42)

# Set dimensions
n1 = 500
n2 = 150
n3 = 100

# Create matrices
A = matrix(rnorm(n1*n2),nrow=n1,ncol=n2)
B = matrix(rnorm(n2*n3),nrow=n2,ncol=n3)

# Assign row/col names
rownames(A)=paste("Arow",seq(1,nrow(A)),sep="")
colnames(A)=paste("Acol",seq(1,ncol(A)),sep="")
rownames(B)=paste("Brow",seq(1,nrow(B)),sep="")
colnames(B)=paste("Bcol",seq(1,ncol(B)),sep="")

# State number of correlations to be performed
cat(paste("Total number of correlations =",nrow(A)*ncol(B),"\n"))

# Test 1 using rbind()
cat("Starting test 1 with rbind()\n")
ptm = proc.time()
output = c()
for( i in 1:nrow(A) ){
    for( j in 1:ncol(B) ){
        myTest = cor.test(A[i,],B[,j],method="spearman")
        output = rbind(output,c(rownames(A)[i],colnames(B)[j],
                                myTest$p.value,myTest$estimate))
    }
}
print(proc.time() - ptm)

# Test 2 using pre-built matrix
cat("Starting test 2 with pre-built matrix\n")
ptm = proc.time()
output = matrix(0, nrow=nrow(A)*ncol(B), ncol=4)
count  = 1
for( i in 1:nrow(A) ){
    for( j in 1:ncol(B) ){
        myTest = cor.test(A[i,],B[,j],method="spearman")
        output[count,] = c(rownames(A)[i],colnames(B)[j],
                           myTest$p.value,myTest$estimate)
        count = count + 1
    }
}
print(proc.time() - ptm)

运行 此代码产生以下结果:

Total number of correlations = 50000 
Starting test 1 with rbind()
   user  system elapsed 
275.560   6.963 282.913 
Starting test 2 with pre-built matrix
   user  system elapsed 
 29.869   0.218  30.114 

所以显然有很大的不同,交流。我不知道这个 'problem' 使用 rbind() 函数逐渐构建矩阵。感谢@Math 指出这一点! :-)

干杯!