循环遍历 A 中所有行并关联 B 中所有列的智能方法
Smart way to loop over all rows in A and correlate with all columns in B
首先post,所以温柔点;-)
我有一个场景,我想将垫子 A 的所有行(大约 50,000)与垫子 B 的所有列(大约 100)相关联。我已经通过这样做解决了这个问题:
output = c()
for( i in 1:nrow(A) ){
for(j in 1:ncol(B) ){
myTest = cor.test(A[i,],B[,j],method="spearman")
output = rbind(output,c(rownames(A)[i],colnames(B)[j],
myTest$p.value,myTest$estimate))
}
}
但是慢得无可救药,运行了30个小时,还是没写完。
一定有更聪明的方法吗? :-)
干杯!
你的代码很慢主要是因为你做 rbind
,它创建了一个新矩阵并复制了前一个矩阵的所有数据。这会产生巨大的开销。
一个简单的解决方案是在循环之前创建矩阵,然后填充它:
output = matrix(0, nrow=nrow(A)*ncol(B), ncol=4)
for(i in 1:nrow(A)){
for(j in 1:ncol(B) ){
myTest = cor.test(A[i,],B[,j],method="spearman")
output[(i-1)*ncol(B)+j,] = c(rownames(A)[i],colnames(B)[j],
myTest$p.value,myTest$estimate)
}
}
好的,所以我尝试了@Math 的建议并决定使用以下代码计时:
# Clear workspace
rm(list = ls())
# Reproducible results
set.seed(42)
# Set dimensions
n1 = 500
n2 = 150
n3 = 100
# Create matrices
A = matrix(rnorm(n1*n2),nrow=n1,ncol=n2)
B = matrix(rnorm(n2*n3),nrow=n2,ncol=n3)
# Assign row/col names
rownames(A)=paste("Arow",seq(1,nrow(A)),sep="")
colnames(A)=paste("Acol",seq(1,ncol(A)),sep="")
rownames(B)=paste("Brow",seq(1,nrow(B)),sep="")
colnames(B)=paste("Bcol",seq(1,ncol(B)),sep="")
# State number of correlations to be performed
cat(paste("Total number of correlations =",nrow(A)*ncol(B),"\n"))
# Test 1 using rbind()
cat("Starting test 1 with rbind()\n")
ptm = proc.time()
output = c()
for( i in 1:nrow(A) ){
for( j in 1:ncol(B) ){
myTest = cor.test(A[i,],B[,j],method="spearman")
output = rbind(output,c(rownames(A)[i],colnames(B)[j],
myTest$p.value,myTest$estimate))
}
}
print(proc.time() - ptm)
# Test 2 using pre-built matrix
cat("Starting test 2 with pre-built matrix\n")
ptm = proc.time()
output = matrix(0, nrow=nrow(A)*ncol(B), ncol=4)
count = 1
for( i in 1:nrow(A) ){
for( j in 1:ncol(B) ){
myTest = cor.test(A[i,],B[,j],method="spearman")
output[count,] = c(rownames(A)[i],colnames(B)[j],
myTest$p.value,myTest$estimate)
count = count + 1
}
}
print(proc.time() - ptm)
运行 此代码产生以下结果:
Total number of correlations = 50000
Starting test 1 with rbind()
user system elapsed
275.560 6.963 282.913
Starting test 2 with pre-built matrix
user system elapsed
29.869 0.218 30.114
所以显然有很大的不同,交流。我不知道这个 'problem' 使用 rbind() 函数逐渐构建矩阵。感谢@Math 指出这一点! :-)
干杯!
首先post,所以温柔点;-)
我有一个场景,我想将垫子 A 的所有行(大约 50,000)与垫子 B 的所有列(大约 100)相关联。我已经通过这样做解决了这个问题:
output = c()
for( i in 1:nrow(A) ){
for(j in 1:ncol(B) ){
myTest = cor.test(A[i,],B[,j],method="spearman")
output = rbind(output,c(rownames(A)[i],colnames(B)[j],
myTest$p.value,myTest$estimate))
}
}
但是慢得无可救药,运行了30个小时,还是没写完。
一定有更聪明的方法吗? :-)
干杯!
你的代码很慢主要是因为你做 rbind
,它创建了一个新矩阵并复制了前一个矩阵的所有数据。这会产生巨大的开销。
一个简单的解决方案是在循环之前创建矩阵,然后填充它:
output = matrix(0, nrow=nrow(A)*ncol(B), ncol=4)
for(i in 1:nrow(A)){
for(j in 1:ncol(B) ){
myTest = cor.test(A[i,],B[,j],method="spearman")
output[(i-1)*ncol(B)+j,] = c(rownames(A)[i],colnames(B)[j],
myTest$p.value,myTest$estimate)
}
}
好的,所以我尝试了@Math 的建议并决定使用以下代码计时:
# Clear workspace
rm(list = ls())
# Reproducible results
set.seed(42)
# Set dimensions
n1 = 500
n2 = 150
n3 = 100
# Create matrices
A = matrix(rnorm(n1*n2),nrow=n1,ncol=n2)
B = matrix(rnorm(n2*n3),nrow=n2,ncol=n3)
# Assign row/col names
rownames(A)=paste("Arow",seq(1,nrow(A)),sep="")
colnames(A)=paste("Acol",seq(1,ncol(A)),sep="")
rownames(B)=paste("Brow",seq(1,nrow(B)),sep="")
colnames(B)=paste("Bcol",seq(1,ncol(B)),sep="")
# State number of correlations to be performed
cat(paste("Total number of correlations =",nrow(A)*ncol(B),"\n"))
# Test 1 using rbind()
cat("Starting test 1 with rbind()\n")
ptm = proc.time()
output = c()
for( i in 1:nrow(A) ){
for( j in 1:ncol(B) ){
myTest = cor.test(A[i,],B[,j],method="spearman")
output = rbind(output,c(rownames(A)[i],colnames(B)[j],
myTest$p.value,myTest$estimate))
}
}
print(proc.time() - ptm)
# Test 2 using pre-built matrix
cat("Starting test 2 with pre-built matrix\n")
ptm = proc.time()
output = matrix(0, nrow=nrow(A)*ncol(B), ncol=4)
count = 1
for( i in 1:nrow(A) ){
for( j in 1:ncol(B) ){
myTest = cor.test(A[i,],B[,j],method="spearman")
output[count,] = c(rownames(A)[i],colnames(B)[j],
myTest$p.value,myTest$estimate)
count = count + 1
}
}
print(proc.time() - ptm)
运行 此代码产生以下结果:
Total number of correlations = 50000
Starting test 1 with rbind()
user system elapsed
275.560 6.963 282.913
Starting test 2 with pre-built matrix
user system elapsed
29.869 0.218 30.114
所以显然有很大的不同,交流。我不知道这个 'problem' 使用 rbind() 函数逐渐构建矩阵。感谢@Math 指出这一点! :-)
干杯!