从成对矩阵到 Cytoscape 边缘 table 的转换太慢

Question

我的代码类似于 this。给定一个这样的矩阵：

  a  b  c  d
a 1  NA 3  4
b NA 2  NA 4
c NA NA NA NA
d NA NA NA 4

它将其转换为：

a  a  1
a  c  3
a  d  4
b  b  2
b  d  4
d  d  4

相关代码如下：

  2 pears <- read.delim("pears.txt", header = TRUE, sep = "\t", dec = ".")
  3 edges <- NULL
  4 for (i in 1:nrow(pears)) {
  5         for (j in 1:ncol(pears)) {
  6                 if (!(is.na(pears[i,j]))) {
  7                         edges <- rbind(edges, c(rownames(pears)[i], colnames(pears)[j], pears[i,j]))
  8                 }
  9         }
 10         print(i)
 11 }
 12 colnames(edges) <- c("gene1", "gene2", "PCC")
 13 write.table(edges, "edges.txt", row.names = FALSE, quote = FALSE, sep = "\t")

当我运行来自后台远程服务器的代码在 17804x17804 稀疏（99% NA）矩阵上使用 screen -S 时，它最初运行s 5 print statements every 13 秒。但是，现在它已经减慢到每分钟 7 个打印语句。为什么算法在进行中会越来越慢？有没有其他方法可以更快地将矩阵转换为 Cytoscape 的格式？

Answer 1

我们将 data.frame 转换为 matrix，使用 reshape2 中的 melt 将 dimnames 作为两列，并将值作为第三列，然后 subset 同时使用 na.rm 删除 NA 行

library(reshape2)
melt(as.matrix(df1), na.rm = TRUE)

数据

df1 <- structure(list(a = c(1L, NA, NA, NA), b = c(NA, 2L, NA, NA), 
c = c(3L, NA, NA, NA), d = c(4L, 4L, NA, 4L)), class = "data.frame", 
  row.names = c("a", 
  "b", "c", "d"))

从成对矩阵到 Cytoscape 边缘 table 的转换太慢

Conversion from pairwise matrix to Cytoscape edge table is too slow

r

sparse-matrix

cytoscape

数据